NVIDIA Corporation (NVDA) Earnings Call Transcript & Summary

November 9, 2023

NASDAQ US Information Technology Semiconductors and Semiconductor Equipment special 60 min

Earnings Call Speaker Segments

Linda Watkins

attendee

#1

Hi, everyone. Thanks for joining. We'll get started in just a couple of minutes. Feel free to say hello to everyone on the chat. Thanks for joining everyone. I see quite a few people logging in now. We'll get started in just a minute. If you're just joining us, go ahead and say hello in the chat. We'd love to hear where you're dialing in from, and we will get started at the top of the hour. All right. Thanks, everyone, and welcome to our webinar today. I'm Linda Watkins, SVP of Marketing at Edge Impulse and a few announcements before we officially get started. If you're just joining us, go ahead and say hello in the chat, we'd love to know where you're dialing in from. Feel free to add anything you like about yourself, your role, projects you're working on. We'd love to hear more. This webinar is being recorded, and the recording will be available a little bit later today. So if there's any portion that you missed or if you want a coworker to listen to it, you'll have that recording via e-mail later today. Also, we have a couple of announcements. Everybody will be on mute, of course, except for the speakers. But if you do have questions, please submit them in the Q&A. We will make sure we have time to get to questions. And we'll have a couple of polls as well. So I will announce when we have those polls, you'll see it pop up on your screen, and we'd love participation from folks as well as hearing your questions. Okay. So officially, welcome to our webinar today: Fast Track AI to the Edge with NVIDIA TAO & Edge Impulse. We have a few really wonderful speakers that I'm happy to introduce. The first is Debraj Sinha. And he's a product marketing manager for Metropolis at NVIDIA and he focuses on building smarter spaces around the world with vision AI applications. Debraj collaborates with partners ranging from start-ups to Fortune 500 companies to market AI applications that drive safety and efficiency gains. So welcome Debraj. We also have an Jan Jongboom, he is CTO and Co-Founder of Edge Impulse. He's an embedded engineer and machine learning advocate, always looking for ways to gather more intelligence from the real world. He has shipped devices, worked on the latest network tech and simulated microcontrollers. So welcome Jan. And then we have Jenny Plunkett, is a senior developer relations engineer at Edge Impulse, a technical speaker and content creator. Since graduating from the University of Texas, she has been working in the IoT space from customer engineering and developer support to consulting engineering. Jenny is also the co-author of the O' Reilly book "AI at the Edge: Solving Real World Problems with Embedded Machine Learning". So welcome, Jenny.

Jenny Plunkett

attendee

#2

Thank you for having me.

Linda Watkins

attendee

#3

Of course. All right, we're going to start with Debraj from NVIDIA. So Debraj, I'll turn it over to you.

Debraj Sinha

executive

#4

Awesome. Thanks, Linda. Hi, everyone. I'm Debraj Sinha. I'm part of the Metropolis marketing team at NVIDIA. So today, I'd like to give you a quick overview on NVIDIA TAO. So TAO is our open source low-code AI training toolkit built on top of TensorFlow and PyTorch. TAO speeds of the AI training process with transfer learning. This powerful technique instantly transfer learned features from an existing neural network model to a new customized one. So here, you can see the workflow for TAO. So first, we have data. We have the option to either use real data or synthetic data. If you don't have enough real data, you can augment it with synthetic data. And this data can be plugged into TAO to quickly fine-tune and optimize your AI model. And then for the training process, you have like a range of like large selection of pretrained models that are available to you, ranging from like convolution neural networks to transformer models that you can use to fine tune your model. So TAO provides fine-tuning capability and we have lots of optimization techniques like model pruning where you can reduce the size of the model. And then for model quantization, where you can reduce the model to a lower precision, bringing it from the FP32 to FP16 to even INT8, so which reduces memory requirements, and it's really useful when you are deploying your models to like on memory constrained microcontrollers. So then for output, we have like you can deploy a model in ONNX format, and you can deploy it on a wide range of inference platform like GPUs, CPUs or microcontrollers or more. So yes, this is a quick workflow for NVIDIA TAO. So let's talk about some of the features that we have. Yes. So TAO can -- is really packaged as a docker container, and developers can use the container as is or they also have the option to deploy it as a service using rest APIs. So for training and optimization, you can train the model from scratch or you can fine tune it using one of our pretrained models and fine-tune it on new subset of data set and then optimize it specifically for inference. And then we have this really cool feature, which is called AutoML. So what AutoML provides is the ability for you to train the model to sweep a set of hyper parameters on your model. So you do some smart hyperparameter optimization to give you the most accurate model for a given data set. Then we have AI-assisted annotation that provides you the ability to like take a unannotated data set and generate a segmentation mask. So basically, the input image has bounding boxes in it, and the output has a segmentation mask. So the main framework for TAO is like it's based on PyTorch and TensorFlow. And the output can be like in like TensorRT that is supported for all our GPUs or the output can be also in like ONNX model format that you can deploy it on a wide range of inference platforms that I already mentioned, like from GPUs to CPUs to microcontroller and more. So one thing I want to clarify is for to run TAO, to train your models with TAO, you need to use it on a GPU, whether that can be on a workstation or in cloud. But then after you have trained your model, that model you can deploy it on any inference platforms. So now let's talk about some other features of TAO. Yes. So this is a really cool that TAO provides like over 100 pretrained models ranging from traditional like convolution neural networks to newer architectures like vision transformers. And we are seeing a great promise with vision transformer-based models providing higher accuracy and higher robustness. So we have many vision transformer-based models that you can use for classification, object detection and segmentation use case. So we are also creating some new models that can be plugged into other vision modalities that we call foundation models. So these generative AI-based model is something that we started working in the large model space. And now we are starting to see this large foundation model being used in the vision space where pretrained malls are trained on an image and text. So let's take an example of a warehouse. So you have this video analytics application where you're detecting human beings. So that's great. But what if we have to detect like scooters, pallets, forklifts. So every time you need to detect a new class of object, you have to retrain in your model but add new neural networks. This adds to the complexity and makes it really difficult for developers to deploy their models in production. So with this new generative AI-based models such as foundation models, because of the generalizable capabilities and like it has 0 short learning capabilities, you can just deploy the model, and it can -- it can detect a wide-range class of objects. So hence, enabling developers to go to market much faster and spending less time in development. So these are some of the models that we have available as a pretrained model with TAO. And we have like a siamese model as well, like which is very useful for comparing 2 objects. And we are seeing like -- are being used in defect detection for manufacturing purposes. So these are some of the models you can check it out on NGC. They are like over hundreds of them. So all our models are optimized to run across all our GPUs. And you can see a list of GPUs all the way from like a higher-end ones like very compute-intensive ones like from A100, A10100 and -- to all the way to the embedded GPUs to NVIDIA Jetson platform. Yes. So now training and optimizing AI is a time-consuming process, requiring intimate knowledge of what the model to use and what hyperparameters to tune. So you can train like really high-quality models with AutoML, without the hassle of manually fine-tuning hundreds of parameters. And this is what we call AutoML, we provide several algorithms for users to select and make it super easy for you to leverage this. So basically, what you do is like you have -- like you tell AutoML like, hey, this is my data set. This is the model I want to train. And you can use one of our Jupyter Notebooks to quickly spin up a bunch of training runs. And AutoML works really efficiently. It will spun off multiple jobs in parallel. So before running AutoML, you have baseline accuracy and after running multiple models through AutoML, you can select like whichever model has the best accuracy works for your inference. And we always see a significant jump in accuracy. And this is really useful for developers when they are deploying their model for inference so that they can use the right model for them. And then we have like -- you can leverage, in fact, practically all these over hundreds of pretrained models with AutoML, you can also leverage our vision transformer-based models, which are marked in green in the table, like DINO, FAN, SegFormer, you can all use all this using AutoML as well. Yes. So as we talked about how TAO simplifies the model training process. Let me talk about like how it optimizes the model for inference throughput, practically on any platform. And this is really important when you're deploying on memory constrained platforms such as like a microcontroller. So we have a lot of optimization techniques like model pruning where you can reduce the size of the model where basically you're removing some of the neurons in the network and bringing it down to like a lightweight model that you can deploy it easily on any platform. And then we have model quantization, where you can reduce the model to a lower precision so that we can bring like a model from FP32 to FP16 or even intake, while maintaining like same accuracy, the model is working like significantly faster, reducing memory requirements and very useful when you're deploying again on a wide range of inference platforms. So we already talked about like we can use TAO in -- like a docker container. The other way you can use it through like rest APIs. So this is an easier way to deploy TAO in a cloud-native infrastructure using Kubernetes and integrate into an application with the rest APIs. So how it works is like you build your own UI on top of TAO and TAO can basically run in a cloud or remote workstation. And these APIs enable you to build additional services on top of TAO, providing like -- providing developers like a lot of flexibility while you automate your training process. So a lot of opportunities like to leverage TAO like through docker containers, rest APIs, so whatever fits your needs. Yes. So we are very excited about this collaboration with Edge Impulse and NVIDIA. We are changing the game for edge AI applications. So NVIDIA AI -- NVIDIA TAO has been integrated into Edge Impulse platform. Now developers get access to like there's hundreds of NVIDIA pretrained models, leverage staff for their training use cases, optimize their models much more easily and then everything in the Edge Impulse platform and then can deploy these models on a wide range of inference platforms, ranging from GPUs, CPUs, even to microcontrollers. So yes, so we have some very exciting demo to show you where you can use a pretrained model and then finally use TAO and deploy it on a microcontroller. Yes. Thank you so much for listening. Back to you, Linda.

Linda Watkins

attendee

#5

Great. Thanks, Debraj. Really appreciate that. One question did come up is, does TAO stand for anything?

Debraj Sinha

attendee

#6

Yes, it stands for train, analyze and optimize.

Linda Watkins

attendee

#7

Great. Thank you. Reminder, if you have questions, put them in the Q&A, and we'll make sure that we have time to get to them. Before I turn it over to Jenny, we're going to launch a poll. So let's give this a try. Everybody should see a screen pop up that has the question, and we're talking about Edge AI use cases. So what use cases are you considering using Edge AI for? And then there's a list there, industrial IoT, pretty bit of maintenance, health care and medical devices, computer vision, agriculture, robotics or something else, which you can feel free to put in the chat. So I'll give you just another couple of seconds to cast your vote, and then we'll look at the poll. Okay. I'm going to end the poll and share the results. Okay. Interesting. So a lot in computer vision and industrial IoT, that's great. And then some were health care medical devices and robotics and then fewer in agriculture. And I'll take a look at the chat later to see what other examples are there. So thanks for participating. We'll have another couple of polls after Jenny's presentation. So now I'll turn it over to Jenny.

Jenny Plunkett

attendee

#8

Great. Now that we have the excellent presentation of what NVIDIA TAO is, what exactly is Edge Impulse. So Edge Impulse is the leading edge AI and ML platform, or machine learning. You can unlock massive value for your industry as we just saw in all that poll results of IoT or computer vision or health care devices, et cetera. Edge Impulse allows you to build real-world data sets at scale for your selected industry, quickly develop custom ML solutions fast and deploy intelligent edge projects -- products, all from the Edge Impulse platform. The Edge Impulse Studio powers the largest edge AI ecosystem with more than 100,000 developers, 250,000 projects and 5,000 enterprises and we're trusted by some of the coolest leading enterprises in the business. So what is Edge Impulse? Edge Impulse is the first fully integrated machine learning platform for edge -- the complete edge AI cycle. From ingesting your real world raw data, from extracting your most important features via various digital signal processing techniques, training and designing your machine learning algorithm and deploying onto your edge device, there are no black boxes. Edge Impulse is completely royalty free and you can deploy anywhere, including things like microcontrollers or all the way up to GPUs like NVIDIA devices. The entire active learning cycle can be developed using Edge Impulse from collecting your data sets, designing and training your machine learning algorithms and digital signal processing algorithms testing and validating the performance of your trained model and deploying onto any edge device. And we also saw a similar diagram to this in Debraj's presentation, where showing how Edge Impulse integrates directly with how TAO is pulled in for designing and training your machine learning algorithm. So Edge Impulse allows you to build your valuable data sets at scale. And we provide many valuable tools that are enterprise grade for labeling and ingesting your ow-label data sets or labeled datasets directly into the tool. We also provide many auto labeling tools and integrations with most widely used data science tools like AWS, Hugging Face and more. We also have strong traceability and quality control, and we have a secure data exchange portal because your data is some of your most valuable information for edge AI projects, and we want to make sure that is as safe and secure as possible. You can also design advanced algorithms with ML expertise directly in the Edge Impulse platform. There's no black boxes. All the code that you would get if you train your machine learning algorithm on your computer on yourself by yourself, is provided to you. So all of the intermediate information, all of the parameters information and more is provided to you or and customizable within the Edge Impulse platform. You can also test and validate the performance of your model directly within the Edge Impulse tool and on your hardware. There's full visibility across the entire ML pipeline, and you can test your development data against 24 hours of real-world data, for example, if you're using an audio project. You can also deploy your project or your machine learning model to any device with ease, from things as small as a Cortex-M0 microcontroller all the way up to a GPU or Linux device. We also provide comprehensive hardware support, benefiting the entire leading Edge ML ecosystem, as you can see here on the right. For more information, you can check us out at edgeimpulse.com or our documentation for developers at docs.edgeimpulse.com. And now I'll hand it over to Jan, so he can show the platform.

Linda Watkins

attendee

#9

Before we do that, Jan, sorry, I'm going to do one more poll. So we are going to go to the next poll question. Around NVIDIA, are you already using NVIDIA TAO pretrained models. So that's a yes or no. It should be a quick one. And I'll give everybody just a few more seconds to answer. All right, you should see that window pop up. Okay. I'm going to close the poll, share results. So most have not used NVIDIA TAO pretrained models yet. So that's great that you joined the webinar and Jan is going to go into a demo. Back to you, Jan.

Jan Jongboom

attendee

#10

Yes. Thanks, Linda, Debraj and Jenny. Let's come to the fun stuff where we can actually show some things. So for me, the really interesting bit on the 2 worlds of TAO and Edge Impulse colliding is that it's 2 really, really complementary tools. And that's why we started to work with NVIDIA 6 months ago and integrating this. So what Debraj has said is that TAO essentially gives you access to all the research that NVIDIA does in new model architectures, in training pipelines, in foundational models, in transfer learning models. And Edge Impulse builds everything else that you need in your machinery pipeline. So the data collection interface, getting data from real devices in the field, organizing that data, labeling that data, making sure that stuff is not mislabeled in there, automatic active learning loops to get the data in. And then finally, the deployment step again. So TAO, at least from TAO version 5, no longer is just limited to deploying on NVIDIA hardware, we can get an ONNX file out. But ONNX file by itself won't run on hardware. You need a compiler tool chain or something that will convert that to some code that you can run on. For example, a small device, right, like a microcontroller or a DSP or maybe even the latest neural accelerators that you can get from some of our silicon partners. And that's where we kick in as well, right? So we will make sure that the model that you get out of TAO that we can run that on like the 50 different architectures that we currently support in Edge Impulse. And that's really cool because every developer in Edge Impulse, they use tools that are already familiar at, like 250,000 projects already created on Edge Impulse, but now actually leverage the state-of-the-art work that NVIDIA is doing the new model architectures, fully attention networks, transformers, any of the really cool new stuff and immediately level that within Edge Impulse. I think that combination is really, really powerful. And that's what I want to show you here in the demo section. So the easiest way of getting started, and this is actually Edge Impulse UI, is if you have either a pretrained model from the NGC catalog or a model you've already trained in TAO. So if you already have TAO running somewhere in your own cluster and you get an ONNX file out, if you can deploy that to real hardware essentially in 5 minutes, using bring your own model. So let's say to make a new project. I can say upload your model. And what I can throw in there is just the ONNX file that I get out of the TAO toolkit or out of the pretrained model list on the NVIDIA website. As long as the TAO T5 model, there's still a few of them that are in etlt format, I think, that's pre TAO T5 those are encrypted. We're working with NVIDIA to get that in. You upload that, you tell us where do you want to run that. So either on the range of device types and we'll tell you this model is small enough to fit on a mid-core MCU or this model is actually so big like with attention networks, we can only run it on a GPU or on a neural processor. You pull that note on. So I've already done that here. Let's go check it out. So it will ask you a couple of questions about the model, what are the classes, what do we need to get this stuff to run. And here, the model that I found is an object detector and it's relatively big. So this won't actually run on the smallest MCUs or DSPs. That's fine, but it will actually run fine about 2.5 frames a second on MPU or if you're running a GPU or accelerator about 15 frames a second. If we want to try that out really quickly, there we go and it actually detects beer bottles very nicely. Awesome. So if you have a model that you've already trained in TAO, you want now to run that on this range of hardware that we have, that's basically it. So you have the deployments. From here, we can find the full list of devices that we support, so including some of the latest accelerators, I can build binary packages for Linux or MacOS with full hardware acceleration. Jetson Nano, for example, if we want to run with NVIDIA hardware or the lowest common denominator is actually just C-plus plus library, completely portable, C-plus plus library, any compiler under the sun or any target under the sun that has a compiler will actually run that. Or if I want to run this model in the browser because I want to see how this model actually works, I can do that too. We'll compile it down to web assembly useful same instructions actually and now we can run that directly in there, and let's test it out. So this is actually a beer classifier. Blame me it's 6:30 p.m. in Amsterdam where I am, so this is totally allowed, so take a look. There we go, a beer, could use some expression here, other beer, 2 beers. So to get this deployed to hardware, just find your deployment targets. As this is a relatively big model, you probably want one of the default Linux deployment targets, so I can get one from X64 or p7 or Linux x86, build it and done. This will now run on any Linux device under the sun full hardware acceleration integrated. Another interesting bit. So if you think about the foundational models that Debraj mentioned, for example, a pose estimation as a foundational model, you don't want to go retrain that every time. You train that model once. And after that, it works. It does pose estimation. That's probably a model that you want to integrate into a larger system. So the way that we do that is that we can integrate that as sort of a feature extractor. So you have data flowing in. You have a feature extractor that in that case pose estimation, I think you're trying to classifier after that. So you can load this model as a feature extractor book in Edge Impulse and then it will look like this. I will let Sean, one of my -- one of Jenny's colleagues in the deferral team. And this actually runs pose that, get the pose out and you get features out describing has pose 51 features. So this is just a feature extractor. In that case, I paired that with a classifier and a classifier, I can train in Edge Impulse. So in this case, I'm combining the work that NVIDIA has done and building really great foundational models, a really good feature -- a really good pose estimation, feature extractor together with a classifier to go train whatever weird shape I'm doing with my hands and arms. So that's one way of integrating with TAO. So the other thing that we can actually do here is that we say, well, I really like TAO and all the models, and I love that you can transfer learning on them. I want to train these models in Edge Impulse and we can do that, too. So as Jenny said, Edge Impulse contains tools that help you with everything you need in a machine learning pipeline from initial data collection to modeling that a machine learning algorithm using -- if you look at vision, it's either you use raw pixels as an input to your model or you use a feature extractor like the pose estimation model. But we do a lot more. So if we look about audio, it's a stuff like denoising, it's stuff like creating spectrograms of your audio to highlight frequencies that you think are interesting. If we think about 10 people or something respond they do stuff with in health care. It might be analyzing signal from a PPG sensor like an LED that shines in your bloodstream to get heart rate and heart rate variability features. If you use in industrial automation, it might be a current sensor or an accelerometer that is moving and you want to do some frequency analysis based on that. So there's actually not machine learning per se, it's just plain old signal processing. That's stuff that we help you, just to highlight the interesting stuff in your signal. Then we train the model, amazing place where we can use TAO, will help you validate that and then finally deploy that in real life. So let's go look at how that would look? And actually, if you want to train a model from scratch using TAO. First thing is, I need to get data. So here's a bunch of photos of me in a variety of settings. So if I want to collect more data, there's a variety of ways. So one that I can just connect a device, so I can use my phone here, I can use my computer, a device or development board. So once again, here are a wide range of things from development boards to actual cameras, for example, if you have an ICAM-500 from Advantech there's instructions on how to get data from that into your project straightaway. Or if you have the data sitting somewhere in the cloud, we can actually get the data straight from there. So get data straight from the cloud. And this is awesome like this is what you want if you want this active learning pipelines. Any time you see something on a real camera that you've deployed and you think, hey, I don't really know what it is or the accurate or sort of the confidence that I have in this result is not high enough, awesome, push that to Amazon S3 or Google Cloud. It will automatically fetch that and throw it back to the project, so your model will get better over time. So let's actually just grab a quick photo myself. And that's going to be me. I want to make sure they're captured. Now I've got a couple of photos of me. Now this assumes that you know the label already of your data, right? So that's relatively easy. I'm standing in front of the camera, of course, I can label it, it is going to be Jan. If you collect data from the field, then that's probably not going to be as easy. So we've built a bunch of tools like the data explorer, which actually helps you completely gives you one view -- overview of all the data in your data sets and help you label the data as well. Shouldn't have added more data because now it is refreshed, but that's fine. We'll go back to that. So that's one part, right? So it can help you label normal data with that. Another place where we can do that is -- so this is a simple classifier, it's either Jan or not Jan, something we can do on a relatively small amount of compute. If you say, hey, I want to do an object detector so like the beer bottles or something, I want a bit more complicated labeling pipeline. And I don't want to go draw boxes around my head all the time, like this is Jan, this is my thumbs up. So one way where we can actually leverage foundational models is that we say, well, we use a foundational model that actually can do segmentation. And we use that to find all the interesting objects in your data sets and you just label kind of plus the groups of objects. So let's say that I have some data here of the beer bottle again, and my air pods and a bunch of photos just my desk. And I don't want to go draw bounding boxes around each of them, so it can go to the older labeler. And it has found all of these subjects using a foundational model. We ran this on GPU, so it goes fast. And the only thing I need to do here is say, "hey, that's actually a bottle." Those are my air pods. And those are also my air pods, and let's see if I have another bottle, full image. It doesn't look like it, save samples, and there we go perfectly labeled data set in seconds. So once again, a place where we can use these foundational models to help you get a high-quality data set quicker. So we built the data sets. We built this data sets and now we want to go and train a model for them. And this is where TAO kicks in, right? So we, in Edge Impulse ship 10 different architectures for computer vision applications, as Debraj said in TAO there is over 100. So it means that the number of model architectures that we all of a sudden support has gone, that's 10x, which is absolutely amazing. So the way that we do that is that we have built a couple of GitHub repos that actually map to TAO repos. So as Chintan said, TAO is essentially a set of docker containers. And they're segmenting them by use case and then by training framework. So there is one for classification models built on TensorFlow One. There's another one for a classification model built on PyTorch. There is another one for object detectors. There's one other for segmentation models, et cetera. So we've created GitHub repos that mapped to those. So if you have an interesting model in TAO, if you see, like a full -- like a FAN network. For example, the FANs ones are in the PyTorch version of the model. So you find the repository that actually maps to whatever is in TAO, and you follow the instructions here in this GitHub repo to publish this as a block into Edge Impulse. So once you've done that, now that model is actually available in the UI to go train. And you might think, hey, well, why do I need to do this step of actually having a GitHub repository and pushing that. Well, it's because we want to have some flexibility in how you use TAO. So if you've ever used TAO, you might be -- you're really familiar with this model config. And that model config is a JSON file -- or YAML file that TAO actually uses to configure the TAO toolkit. So it has stuff like, hey, what's actually the optimizer already used to train this network, right? What's the decay? What's the momentum? It's like one of the things that we have here. What is the learning rate? It might be a very simple one, but I might actually have multiple learning rates in a model. So using these GitHub repos, that's why -- if you're familiar with TAO, you read through that, you're probably very familiar with that. We actually give you full flexibility in all the options that TAO provides. Let's find out. Once that is pushed into Edge Impulse, it will just look like this. So I have my TAO models. You have a [indiscernible] model, but I also have FAN and attention network sitting in here. I'm going to select that. And I am going to train this model. So I've done that here. Where possible, we use transfer learning. So we don't train this complete model from scratch. We say we take a fully trained model in -- that the TAO team has already built, we freeze the bottom layers. So we don't change all the layers that deal with detecting shapes and contrast and whatnot. We don't really know what's going on in neural network, but more or less, that's what they do in the bottom layer of the network or the top layers in the network. And then we unfreeze the last part, and that's part that we retrain with your own data. That's a really clever trick because that means that you can train models with a lot less data. If you want to train a neural network completely from scratch, especially these really large ones, right? You'll need tens or hundreds of thousands of images in this case, but we can fine tune in just a few hundred images, which is what we've done here, like this is 300 images of me and not me, enough to actually fine-tune this model. We'll show you the accuracy and the loss and what Debraj said already, we do stuff like quantization as well here. So we quantized in our case to int8. You can run other stuff in the TAO toolkit like pruning the models. This is actually a model that I want to deploy to a Himax DSP. So the one that I have here, it's really small, and it has a camera module on it. In that case, pruning actually doesn't make any sense because we don't -- yes, because we can run this much faster, we actually don't have something pruned. We get some feedback on the RAM use and the flush use, it's normally also inferencing time. In this case, we use the concised version. An optimized version actually has the same accuracy. So in this case, we have very little validation loss. So in this case, we can actually use a much faster model with basically no loss in accuracy, which is really us. I might think, hey, there's actually a few options that I have here, right? So like the number of training cycles, the learning rates and the [ alpha ] model like the width of this model. And these will differ per TAO model. And you have some freedom in how you expose that. And what Debraj said like we have AutoML for this. We don't need to go by hand through all of this, okay, let's train a model with alpha 0.25. Let's see what it does with the RAM and ROM of the model and the accuracy, maybe gets better, maybe gets worse. But rather, we can use a tool to actually do it automatically. And we've integrated AutoML into the EON tuner. And the EON tuner is essentially a way to go over all of these parameters really quickly and train all these models in parallel and then see which one performs best. So if I go to the settings here, I have some basic settings. So I know that I want to run this on the Himax WE-I target, I have my time for inferences which is limited here to about a second per inference, which is way too much, that's fine. And then I have some settings that I set here. So I want to consider like do I want to train this model in RGB or in grayscale? That's an interesting question to ask, right? Do I actually need the full color resolution for this model? If I do it -- if I train this model on grayscale, then I control way 2/3 of my input data. That's awesome. Should my learning rate be 0.0005 or 0.0001, I don't know let the computer figure it out. Should I use alpha 0.05, 0.1 or 0.25. I don't care, right? Go figure that out for me. So you set it up. It's really easy. Let's say, if you want to consider another TAO model I also want to take the FAN model that I have here, awesome, click here, boom. We've had this model. We've had the parameters that you can actually set here, and let's say, even now we'll train that model for you as well. So it's really easy to compare different TAO models in Edge Impulse quickly. Afterwards, we get a model comparison out. So let's actually look at -- test that accuracy for int8. And we see that the RGB model with alpha 0.1 and the grayscale model of alpha 0.1 actually have the same accuracy in the test set. So in this case, I can safely use my grayscale model over the RGB model. This is a relatively simple data set, as you can see. But if you have lots and lots of data, lots of different model variants. This is an incredibly useful tool to find a model that not just gets the best accuracy but also fits your latency RAM and ROM constraints. Maybe you have a model that you do something on production line monitoring, you need to have a response within 50 milliseconds. If you take 60 milliseconds you don't want to even consider that model. So that's the kind of stuff that you can throw into the EON Tuner. And with TAO have this huge model catalog, which you can compare there -- in the model...

Linda Watkins

attendee

#11

Jan, would you mind closing your inferencing so we can get your video back?

Jan Jongboom

attendee

#12

Wait, what?

Linda Watkins

attendee

#13

Your video webcam has paused -- sorry. Maybe you can close the tab for your brb -- oh, perfect. Thank you. Sorry.

Jan Jongboom

attendee

#14

I don't know what happened. Okay. Cool. Yes. So let's actually go deploy it on the Himax target. So we have a model that we're happy with 98% accurate on live elevation set. I can find the Himax target. So for fully supported development boards, and that's stuff that we don't work with on partners. It's 30-plus boards or something. We can give you a binary with the complete client code for the driver, for the camera in this case and running the model. So -- you select that you hit built and you'll get a binary file. And I've done that already for the sake of time. So when I'm happy to run this on device, I cannot push it here. And this is a DSP. So this is not a processor running Linux. This is a DSP. It's from Himax running at 400 megahertz. So it's actually run that on device. We can get a little device view of the camera and that's Jan and it is no longer Jan. So we know that this works actually runs about 230 milliseconds, so about 5 frames a second on this Himax targets. Really awesome. Now we have the same model. I found a model that I found interesting to TAO, I fine-tuned it in Edge Impulse. I looked at what are the parameters that I need to get to run on hardware. And then I compiled it down to binary so it can run on the Himax targets. That's browser window. Now the interesting bit here is that we have so many targets here, right? So these are all the officially supported MCU targets and the officially supported CPU and GPU targets and accelerators, production targets like production cameras and whatnot in here. So if you want to run this on any device under the sun, would like to say, we've got you covered. The last thing that I want to highlight here is there's a lot of stuff, what Jenny said in like training a model is not just a onetime thing, right? You want to do this more often. So typically, what I recommend to people is like after you've set up your first model, you have something that works. Awesome. Okay, let's make sure that we get data from the camera and collect that on a daily basis, like you see something weird, you have something where the confidence is not high enough, okay? Awesome. Set up a data source, right, fetch the data, get some data from an Apple portal or an organizational data set or somewhere in the cloud. Afterwards, you can recreate a data explorer. See what's the end-label data on the data set, and we can help you label that. If there's new data, retrain it automatically. Awesome, and then create a new deployment. So every day create a new deployment for whatever C-plus library, awesome, run it every day and then send a message to a web hook somewhere in my device management system, so I can update all my device in the world. So now they have a self-learning model, and that is a thing really, really cool. Some interesting bits, what we have here, for example, is that we can do model tracking over time as well. So every day, we'll look at your models, like, hey, what was the training set accuracy on October 29, awesome. What was the test set accuracy on that day. If there's changes, awesome, alert me, send me notification on all of that. So a lot of the interesting bits around active learning is sitting in Edge Impulse to make these devices -- to make these models better over time. There is very little time, but the main thing is Linda will throw up the CTA at the end of the webinar on how you can try all of this out yourself. For me, I think the combination of Edge Impulse and TAO is really cool because it gives every Edge Impulse user, the ability to use these 100-plus models from their catalog, access the latest research from NVIDIA and I think for anyone who's considering TAI NVIDIA -- or Edge Impulse is by far the easiest way to get started. And definitely, the easiest way to deploy your TAO models to your model. And with that, I'll give it back to Linda for Q&A.

Linda Watkins

attendee

#15

Great. Thanks, Jan. And we have quite a few questions that have come through. And just before we get to the questions, we're going to do the last poll, so I'm going to jump into that real quick. Are you already using Edge Impulse. So we'd love to know if you're using Edge Impulse or not. Yes or no question, we'll give you just a minute to answer. And then also I put in the chat, and I'll show it on the screen, but if you go to edgeimp.com/trial, you can, of course, try out Edge Impulse and NVIDIA TAO pretrained models, for free. It's an enterprise trial. There's also the option for the community count as well. So make sure you go there. Ending the poll, Okay, about a 60-40 split. So quite a few already using Edge Impulse. But of course, this is an opportunity if you're already using it to explore the NVIDIA TAO models, and if you aren't using it yet, please go to that link. And I will share that link now, and then we'll jump into the Q&A. Great. So here is the link to go to and questions.

Linda Watkins

attendee

#16

All right. I'll let the -- there's a question about pricing, licensing, that sort of thing. So Jan, maybe you can talk more about Edge Impulse pricing and NVIDIA TAO pricing? Is it free to use pretrained models for NVIDIA TAO?

Jan Jongboom

attendee

#17

Yes. I don't want to go sit on Chintan and Debraj's Chair, but I'll answer the Edge Impulse one. So Edge Impulse has a [indiscernible]. that's paid per project, very simple for private projects. That's what's currently needed for the TAO integration because TAO can only train on GPUs. We only offer GPUs currently on our enterprise plan. In addition, we have a community plan. People can train on CPUs there. So unfortunately, you don't have access to the TAO stuff, but for smaller companies and for people who do stuff in the open source world, you can use community there. And I'll defer to Chintan and Debraj on the TAO licensing. One thing I want to say, any model you get out of Edge Impulse, we deliver it as source code where possible, royalty-free. So once you deploy, no royalties.

Chintan Shah

executive

#18

Yes. I can answer the question on the TAO part. So if you use TAO as a part of Edge Impulse there's no separate licensing or additional cost. It will all be factored into whatever the enterprise plan that Edge Impulse charges. So there's no separate license agreement that you have to agree to.

Linda Watkins

attendee

#19

Thank you. Okay. Now a question around how to handle post processing operations that can exist in the models when deploying on an ML accelerator like DSP?

Jan Jongboom

attendee

#20

Yes. So once you get out of Edge Impulse, it is a library, just source code, C-plus plus that contains all the preprocessing, signal processing and ML and it might be neural networks for computer vision, but it also might be Gaussian mixture models for anomaly detection or classical ML things for sensor -- for time series sensor data. So we give you that as a package. You put sensor data and you get probabilities out. But what you can do with that is up to you, right? You write your code as is. So if that's on an MCU or DSP, you'll use some C-plus plus to do that. If you do this on an MPU or NPU targets, you can use our python SDK or Node.js SDK or Go SDK or C-plus plus SDK to kind of wrap that all together. That's it.

Linda Watkins

attendee

#21

Great. Would it make sense to integrate TAO's generative models into data prep tools to create synthetic data for training? Probably a question for...

Jan Jongboom

attendee

#22

Oh yes -- yes maybe for Jenny, yes.

Jenny Plunkett

attendee

#23

Yes. So I'm sure the answer to the generative AI models that are present in TAO. However, I have been working closely with the NVIDIA Omniverse team on creating more efficient pipelines of creating synthetic data sets for -- specifically for computer vision applications. So a couple -- a few weeks ago, we did another webinar with the Omniverse team on how NVIDIA Omniverse can be used to accelerate your synthetic dataset generation pipeline from data set collection in Omniverse into your Edge Impulse projects and deployment to your real-world edge device. So I recommend checking that out. We have more documentation than I'll post a link to in the chat, specifically about the Omniverse piece. But maybe Chintan could speak about the other models present in TAO that could help with synthetic data set generation.

Chintan Shah

executive

#24

Yes. For the synthetic data generation from TAO side, right now, most of the models are really focused around creating AI models for specific tasks. We are in the process of adding some models, some kind of foundational models, which can help you generate synthetic data, which you can use it for generating synthetic data. And once those are added, you can use it and we'll have APIs for users to use it. But at this point, I think most of our models are already focused around creating AI models for deployment use cases.

Linda Watkins

attendee

#25

One very important question for Jan. Will the Jan detector work, if you wear glasses.

Jan Jongboom

attendee

#26

That is a good question, I don't know. I don't have glasses here. But I mean, here's the thing, right? So what I like to say with this stuff is that a lot of the use cases that we see with customers is that they want constrained models for constrained problems. And I think the Jan versus not Jan detector is a really good example of that. Let's say you have -- you're monitoring your production line or something. What I want to know is like is the label attached properly to the bottle, yes or now, very constrained problem, right? I don't need to detect it at your office walking on online production line. It's not going to happen anyway. I just need to know is there a label on it, yes or no. So it's relatively easy to just get 98% coverage or something of all the cases that will actually happen on that line. like here, right? I just need to know Jan versus not Jan, I don't care about Jan with glasses, is not going to happen anyway in this case. So constrained problems for constrained or constrained models for constrained problems in a sense, not a real answer but just something I wanted to say.

Linda Watkins

attendee

#27

Thanks, Jan. When developing custom DSP blocks, does Edge Impulse help with cross validating between embedded and behavioral like Python implementations?

Jan Jongboom

attendee

#28

Yes. That's a really good question, Kevin. So yes -- yes and no. So for all the blocks that we support, so we ship a whole bunch of these things, like, for example, for spectral analysis, audio analysis and whatnot. We ship both life examples and highly optimized C-plus plus examples, and those are guaranteed to be correct. If there's any difference in that, that's bigger than 0.05% or something, we'll address it as a bug. And we deliver that code open source and you can look at that. If you have custom DSP codes, there's 2 ways that we can integrate into Edge Impulse. One is just you gave us the -- you send the Python codes as a block to us, and you hook the C-plus plus code in your affirmer. In that case, there's nothing really that we can do to validate that. That's on you. We have a way of putting the C-plus plus codes as a DSP lock in Edge Impulse where our DSP team is working on. In that case, it's easy, right? It's guaranteed to be the same thing. And we have bindings like using CMSIS-DSP from C-plus plus in Edge Impulse as well. If you want to make sure that it actually works under CMSIS-DSP, stuff that we do internally. And last, and that's something that we're experimenting with is the idea to cross compile your Python codes to highly efficiency C-plus plus source codes using JAKs. That's really cool because that allows you to write your DSP codes in Python as usual. And we guarantee that the output on device is both fast, uses factor retention and whatnot and still runs on MCU or DSP. We've used this ourselves for stuff like our anomaly detector, logs that is written in Python, cross compiled to C-plus plus. And we're experimenting with this for DSP code as well. So if someone is interested in that, hook up, I'd be happy to talk through that in more detail.

Linda Watkins

attendee

#29

Thank you. There was a couple of questions on pricing and also where to get more knowledgeable access to experts. So for enterprise pricing, please go to our website, edgeimpulse.com. You can contact us, talk to a salesperson. We're happy to talk through your use case and what that looks like. For knowledgeable questions or answers to questions, we have our docs, extensive doc pages. But also you could go to our website and request a demo, if you want in-person conversation and digging into more details about your particular use case. So feel free to go there.

Jan Jongboom

attendee

#30

I just say it's a subscription basis. I saw that question as well. Subscription basis per project per month, no royalties after.

Linda Watkins

attendee

#31

Great. Thanks, Jan. There are a couple of questions that came through Debraj during your presentation. So we might be missing a little bit of the context now. So I'll ask these questions, but if more context is needed, we can have those folks that submitted the questions add more. One question was about, is there a quantization option 16-bit or 4 bit?

Debraj Sinha

attendee

#32

Yes, we do actually. So for-- you can reduce their model precision from FP32 to FP16 and even int8. Chintan, do we have anything for int4?

Chintan Shah

executive

#33

No. We don't support in for. We are not able to get good accuracy with int4. So int8 is probably the lowest we support right now. And we have a quantization of our training as well as post-training quantization. So quantization of our training kind of factors in the quantization loss during the training process so that when you convert your floating point to integer, you retain the same level of accuracy, where post-training quantization, there could be some training -- there could be some accuracy loss. So we do have -- we do support quantization of our training. So those are the precisions that -- lower precisions that we support as a part of TAO.

Jan Jongboom

attendee

#34

And another downside of all of this like going below 8 bit is that hardware acceleration is often not there. So in MCU and DSP, for example, [indiscernible] CMSIS-NN that operates on 8 bits numbers and doesn't have any hardware acceleration for 4-bit at the moment. So we're tracking that image into so if something like that would ever become available and fast and harder, we'll do it. For example, we do 4-bit quantization on BrainChip hardware already because they have that in hardware. So this is somewhere where kind of the 2 worlds can collide.

Linda Watkins

attendee

#35

Great. A question around how we can deploy in the range of devices. So what range of NVIDIA Jetson devices are supported, like do you support the latest Orin AGX?

Debraj Sinha

executive

#36

Yes. So all models that have been trained on TAO can be deployed on any Jetson device. So just to clarify for training, you need a GPU, you cannot train the model on a Jetson. You need to train it on a GPU, either in your workstation or in cloud. And then after training that model, you can deploy it on any GPU, on any Jetson platform. So yes, Jetson Orin is absolutely supported.

Linda Watkins

attendee

#37

Great.

Chintan Shah

executive

#38

Yes. And I can even extend like Jetson, absolutely, any NVIDIA GPU will support from TAO but we are also extending the other platform and what Jan showed you can also deploy it on a DSP as well or an Arm NPU. So the possibilities are in less on the platform that you can deploy these portals on.

Linda Watkins

attendee

#39

Well, we only have a couple of minutes left, so I'll open it up to our wonderful team of speakers here if there's any final thoughts you want to give. And just a reminder to folks, if we didn't get to your questions, and there's still information you'd like, you can always go to our website and request a demo. We'd be happy to talk with you. And of course, you can get started right away at edgeimp.com/trial. So Jan, any final thoughts.

Jan Jongboom

attendee

#40

So I mean we are a small team sort of, well, not so small anymore, we're 75 people, which is already way bigger than I ever envisioned this to be. But we have limited capacity in adding new models and whatnot. So what gets me super excited with TAO -- so through TAO, you get access to the smartest people in the world. The NVIDIA engineers designing these new network architectures, the people building these training pipelines. So that gets me super excited. This is something that we've always been struggling. There's always something new to chase. And giving access to that to all developers that got me really excited basically from the moment that me and Chintan met -- that, me and Jenny and Chintan met at NVIDIA office in California a few months ago.

Linda Watkins

attendee

#41

Thanks. Debraj, any final thoughts?

Debraj Sinha

attendee

#42

Yes. I am excited about this collaboration. Yes, just opening the gates for our developers to deploy like leverage TAO and deploy this model on a wider range of inference platforms. So yes, so this is like a game changer for edge AI applications, yes.

Linda Watkins

attendee

#43

Great. And Jenny?

Jenny Plunkett

attendee

#44

I don't have anything else, but I really appreciate everyone for coming and asking such insightful questions. It really helps. And I hope you use Edge Impulse. And please let us know your feedback via the form on our website.

Linda Watkins

attendee

#45

Yes. Thanks, all. Appreciate your time. Like I mentioned before, the webinar has been recorded, we'll send out the recording a little bit later today. And then, of course, we'll send a link as well if you want to continue the conversation. Thanks all.

Jenny Plunkett

attendee

#46

Thank you.

Debraj Sinha

executive

#47

Thank you.

This call discussed

AI Inference Demand

For developers and AI pipelines

Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.