NVIDIA Corporation (NVDA) Earnings Call Transcript & Summary

February 23, 2023

NASDAQ US Information Technology Semiconductors and Semiconductor Equipment special 56 min

Earnings Call Speaker Segments

Unknown Attendee

attendee
#1

Hi, everyone. Thank you for joining us today for our Metropolis webinar. I'm Debraj Sinha, and I'm part of the Metropolis team here at NVIDIA, and I'll be the moderator for today's session. Before we begin, I'd like to cover a few housekeeping items. All the windows on your screen are resizable and movable. At the bottom of your screen are some icons that offer more information. We really want this to be as interactive as possible, so if you have any questions, comments, please submit them throughout the dock using the Q&A window located to the right of the slide. We'll answer the questions after the presentation. Let's get started. We have some exciting topics to cover. We have created this amazing end-to-end AI workflow that makes it easier and more cost-effective for app developers to build Vision AI solutions. Starting with creating synthetic data, we have Adam Scraba who leads Metropolis Marketing. He's going to talk about how we can generate synthetic data with Isaac Sim. And then we have Khoa Ho, who's a product manager at NVIDIA when you talk about Metropolis Microservices. And then in the end, we'll have the Q&A session. So let's get started. Over to you, Adam.

Unknown Executive

executive
#2

Thanks, Debraj, thanks so much. We want to -- let' get started here. We want to give you some key updates to Metropolis as we continue to focus on streamlining Vision AI app development end-to-end, the entire journey of building AI and improving workflow from data set creation, training to bringing down solution costs, through tools to increase throughput, tools to build cloud native and then deploy from the cloud all the way to the edge. And we think that a really important area, exploding with investment research tools, certainly by us, is here very early in the development cycle. How we think about and build our data sets. In this session or at least a large portion of it, is on this next frontier for how computer vision will be built at the generate stage with simulation and synthetic data generation. So a simulation is a requirement for any of us building products and services, and we are seeing it explode across so many areas. We need to simulate our physical world before we build it. We need to -- and then we continue to simulate after it's built, whether it's robots, autonomous vehicles, simply because we don't have enough time in our life to train in the physical world. In the case of vehicles, we can't drive them millions of miles to bring out our next model of vehicle. In the visual world, where our products need to see, simulating what our products see and how to better interpret that vision, that perception layer is critical. And even in the communications space, we've been simulating things like 5G interactions with city environments and Internet traffic. So simulation is here and it's only going to get more and more important. For us, in the visual world, where a lot of those of you joining this call are certainly our team in the Metropolis team, there is a few huge key drivers to -- key drivers or key challenges in creating Vision AI solutions. One is that we, you, produce products and services that promise to solve a problem in not just average situations, but most situations, in ideally all situations. And while we may have a decent amount of real data for average scenarios, common occurrences, corner cases happen and represent a really important differentiator for us as we try to differentiate our products in the marketplace. And second, the cost of -- simply the cost of gathering enough real data to achieve the accuracy we need is often cost-prohibitive and time-consuming. Finding good data can be extremely hard. It's a manual process. As an example, NVIDIA's People Net pretrain model, the one that you might use when you use our TAO toolkit, is made up of 2.5 million images. That's a ton of images together in the real world. It's a huge expense. And lastly, environmental diversity, the visual diversity that AI needs to remain accurate. It's a huge challenge. We operate often in 24/7 conditions in our infrastructure under various lighting conditions, weather and different counter types. Now the very near future for computer vision is simulation. And this is where AI and digital twins are today colliding and intersecting and increasingly so. Simulation is really for us effectively a way to create practically infinite scenarios, camera angles, lighting conditions, sensor and noise imperfection, the list goes on, all with ground truth annotation inherently built in, giving us what was previously impossible with traditional approaches is now possible. The answer for many development efforts is combining transfer learning, pretrained models and augmenting it with synthetically generated visual data and some curated real-world data, which is a big trade-off. We're trading huge amounts of manual costs for gathering and labeling and curating real data sets for what is effectively an automation tool, an AI automation tool, a path to lower cost using simulated photorealistic imagery, seeing objects and environments. And the result, in many cases, is slashed cost and time. Cost of curating the data set, labeling it. And instead, we're using synthetic -- we generated data with some huge -- or hugely shrunken amount of real-world data. So now let's talk about some really interesting trends we're seeing. AI increasingly is being done in both the real world -- actually, I'm going to reset here. It was the wrong video that was playing. AI increasingly is being done in both the real world and in the Metaverse, the digital twin of a space or piece of infrastructure or process. PepsiCo, as an example, builds 3D synthetically generated images of their products to streamline how they train vision AI models deployed in their warehouses, and they deploy the same AI models in both their real warehouses and in the simulated 3D models of their warehouses, the digital twins. And it's -- importantly, it's the same model in both the digital twin and in the real warehouses. Another example, one of the largest supermarket chains in the United States, Kroger, uses Isaac Sim to do scenario planning in their stores to optimize customer checkout experiences and generate training data for their computer vision models in the real world. They take CAD models of their store, bring it into NVIDIA Omniverse, add materials, lighting and generate photorealistic stores and drop shoppers in the scene, script behaviors, allowing them to test various checkout configurations and find optimal configurations to reduce checkout times. So this is both a scenario planning tool but also a tool to help them train models in the real world. Simulation is also being used to build and test out new vision AI algorithms. In this recent Metropolis Microservices demo that I'm showing here, a retail store was simulated complete with simulated shoppers and behaviors where we test out a multi-camera tracking algorithm and tracking the journey of multiple shoppers across the store, moving in and out of various CAM reviews but continually tracked across the store is a really hard problem to capture, certainly, completely growing truth. But with fully simulated scenes, it's a lot easier. And so here, we used synthetically generated store with Isaac Sim people animation to test out this new algorithm and show that it worked. For simplicity, I just showed one shopper being tracked, but all shopper journeys can be tracked. So many, many journeys can be tracked all at once. Another case is capturing corner cases. It's also a very important use of simulation and synthetic data generation. Often the situation we want to be incredibly good at identifying rarely happens in the real world. So we have very little real image data to train on. Here, a customer partner, Deutsche Bahn, the national rail operator of Germany, uses simulation in Omniverse to simulate just that dangerous but rare occurrences on their tracks and stations. And sometimes, the infrastructure that we need to continuously monitor, identify and classify visually degrades physically -- and visually degraded significantly over its lifetime, in this case, license plates. We still need to read the license plate across a wide range of effectively defect. A partner of ours uses Omniverse Replicator to simulate cars and license plates with a wide range of realistic damages and scratches and dents to train better models. So what all these examples have in common is simulation. Using NVIDIA Omniverse, Isaac SIM and Replicator, pulling this all together to create synthetically generated scenes, scenarios and ultimately helping us create a richer data set for the lower cost. So what is Isaac Sim, and how does it connect to Metropolis? Well Isaac Sim is a simulator built on top of Omniverse. It was originally built -- you may have seen it before, it was originally built for robotics, but it has incredible capabilities for streamlining vision AI and computer vision development regardless if you build robots that move or not. A lot of us turned out we actually earn the robot business, but the robots we build for maybe are a street corner, maybe it's public safety in an office building. These are automated piece of infrastructure. They just happen to not move, but perception is still incredibly important. Isaac Sim helps us build photorealistic digital twins, where we can do scenario testing and/or generate synthetic data for training our computer vision models. We use it to set up scenes, populate them with assets like people, infrastructure. You set up cameras and sensors and generate automatically annotated data sets, and it integrates incredibly easily into your workflow, assuming you're building on top of Metropolis. Now let's step through some of the workflow. So there's a tool for scene generation. There's also a tool for asset population. So you can populate the scene with SIM-ready assets. A growing set of assets are being added to Isaac Sim all the time. I happen to be showing a clip that shows a lot of warehouse focused assets, but we've added people, and it continues to expand. You can also import your own assets, or one through third party tools as well. And I'm going to skip to a different video. There's also a tool for scene generation by which you can procedurally generate many scenes with a set of -- with the free content, using a tiling method. The tiling randomization creates the different floor plans that you can generate. And once the tiles are set, obstacles can be randomly spawned in certain tiles. So we have asset creation, we have scene creation and putting that together is pretty powerful. Once we have seen our scene, our scenarios, our lighting and other environments, we want to simulate. We have the ability to, with Isaac Sim tool, align very closely to our real-world ground truth sensors, whether it is an RGB camera, a depth camera, fisheye, LiDAR, we can get completely annotated data, instant segmentation, bounding boxes, poses and more. And the important thing is that, obviously, we create our scene, we populate it. But this is really magical stuff to be able to really close the sim to real gap when we focus our sensor on exactly the type of sensor that we're going to be deploying in the real world. It's incredibly powerful and it's one of the superpowers, I think, of Isaac Sim and the tool set. What we have ultimately -- sort of to summarize, what we have is this tool that helps us along the 4 main workflow steps, with scene creation; domain randomization where we can randomize lighting, materials, colors, spin our models around, really create a lot of diversity in our data set. We get ground truth annotation, perfectly labeled and annotated for our training. We output the data into the right format, the formats that integrate really well into TAO, and it just is a completely streamlined workflow. So let me -- let's actually step through and dive in to a few things that are new and exciting. I want to spend a little bit of time diving into a new extension of Isaac Sim, the people animation extension. This is the omni animated people extension, where we place an animated people into our simulator world. It has a UI for configuring simulation settings, behavior scripts that control characters, preconfigured character assets and animation. And what we do is we import a USB scene, and you can use other formats that can be converted to USB format. The lighting of the scene can be described in the USB file, but it also can be tinkered and altered after the scene is opened in Omniverse as well. And then we populate assets. We can populate any asset, human, robot, box, building, piece of infrastructure that's in USB format. A suite of assets already comes with Isaac Sim and Omniverse, prebuilt scenes and assets that range from people to warehouse infrastructure. In the case that I just showed you, we placed a few people in the scene, and we're getting them ready to interact in and with the space, in this case, a warehouse. Then we run the simulation. We literally hit play on the left nav bar. A sequence of actions can be scripted for each character, things like fit, stand, walk, look around, queue. And it includes static and dynamic obstacle avoidance. I don't know if you just noticed that those characters were intended to walk into one another, but the characters are environment-aware, and they'll avoid themselves with static and dynamic obstacles. And of course, after all this is done, the output data set is created, the objects are annotated, and we've got an amazing new data set. And obviously, for us, it's a brand new workflow. So once we have the synthetic data, we move on to the training phase and transfer learning with NVIDIA TAO, where we can combine the synthetic data set, which could be a huge data set that we've now created, with alterative camera views and a lot of diversity, a lot of scenarios, and we combine it with pretrained models and iteratively refine our model. And hopefully, TAO toolkit is -- hopefully, TAO toolkit is not new to you. It's our low-code AI toolkit that enables you to train models using simple commands or take one of the many sample Jupiter notebooks that we have and modify it to your training needs. It has data augmentation built into it that lets you increase the size of your data set, either changing the color of images, saturation, hue or the physical size, shear, crop, rotate, et cetera, helping generalize the model without having a large manually gathered data set. And then you can prune and quantize the model to your target platform and really boost throughput and ideally -- not ideally, exactly lowering total solution cost. And Isaac Sim supports annotation in kit format that is accepted into TAO. So it's a really nice, tight integration between Isaac Sim and TAO. And as an example -- I want to kind of put all the tools together in an example workflow for you. Defect detection, in this case, in manufacturing, is just one example, but a really good one. In defect detection, the thing you're trying to be very good at finding, as I mentioned earlier in another case, statistically doesn't happen very often. And replicating it in the real world is hard and very costly. Building AI models for it can be very challenging with traditional approaches. But with rigid bodies that tend to obviously crop up a lot in industrial settings and fairly constant lighting in these industrial situations, simulation is really ideal. You can build models with the majority of synthetic data and just a little tuning with real data. So on the left, it starts with capturing and building a 3D model of an object, setting up a scene, determining where your defects maybe want to be. Then you procedurally generate a large and very specific set of defect scenarios. This case is an electronic device with some thermal paste being applied to it, and the importance of getting that thermal paste just right is really important, of course, for the thermal properties of that product. But you can place the defects -- the extent of the defects procedurally in a script so you can create a vast amount of variation and defects. You then -- in the third column, you render it with annotations, all in on diverse replicator or IXN. And then you take the synthetic data set into TAO, train it, perhaps do some fine-tuning with some real data. And that's an iterative process within TAO. And then you deploy the model, and it's optimized for the target platform and TensorRT and our Triton Inference Server. And altogether, you've got -- for the first time you've got this really powerful generate-train-optimize-deploy workflow, all within Metropolis and incorporates Omniverse, and you can deploy that model from the edge to the cloud. And given the TAO is extremely obviously intimate with the target platform, you can optimize it incredibly well and reduce cost and maximize throughput. So simulation is important now, and it's getting more important. We encourage you to get started with NVIDIA Isaac Sim today. Getting access to the individual license is easy and it's free with access for collaboration for up to 2 users. And there's an enterprise path. There's an enterprise license for teams when you get to that point. The links to a lot of the assets that I've shared with you, the links to Isaac Sim developer pages will be in the resource center in this webinar page. It will be included in the PDF as well. And definitely, when you get to Isaac Sim and you start to look at Omniverse, there's a host of amazing FAQs that we encourage you to look at, there is a lot of great assets the team has put together to get you started. There's documentation, tutorials, blogs, videos, and all the links, as I said, are available in the resource center of this webinar. So we encourage you to get started. It's one of these things that -- it's not if, I think it's a when for simulation and synthetic data generation. And the tools are out there. They're ready. Isaac Sim is available. It's in beta, but it's definitely part of the omni -- Isaac Sim is available. The people, the omni and in -- people extension's in beta, but you can absolutely get access to it. We encourage you to do so. So with that, I'm actually going to pass it over to Khoa to cover a bit of an SDK update and really have a focus on our new Metropolis Microservices offering.

Unknown Attendee

attendee
#3

Thank you, Adam, and Debraj. So for this part of the session, I'm going to talk about the subsequent steps of the Vision AI application development journey. So you have seen how you can use Omniverse replicator to synthetically generate data, to complement your real-world data sets using a low-coding TAO toolkits to help train and fine tune your models. For the next part is using Metropolis Microservices to build your application -- cognitive application end-to-end and deploy and scale easily. So even in normal circumstances, the end-to-end application development in additional to model building and data collection can be a very time-consuming and resource-intensive part. And as you can see how synthetic data can drastically reduce the time and resources for the earlier stages of your application development journey, the application building part can become more of a bottleneck. And I hope with the release of this new set of micro services and reference application is to help you accelerate this process. So looking at this diagram from bottom up, from the right bottom corner, you've seen some of our Metropolis developer tools in SDK. Some of them you're probably more familiar with like TAO toolkits, as Adam just mentioned; TensorRT and Triton; TAO toolkit for motor training; TensorRT and Triton for modern optimization; DeepStream for a real-time AI streaming pipeline creation and other tools to help you scale and develop a full cloud-native application. So in this product, we offer a collection of micro services. So they are the pipelines that you can develop on your own or through DeepStream have been containerized and packaged with Hemchart for easy deployment in this cloud as well as Edge or any hybrid environment. So they span from your more familiar single camera perception to help you detect, track or generate the feature embedding of objects, to help you track objects across multiple cameras to micro service that help you perform behavior analytics on objects, determine whether some objects is entering the region of interest, crossing a virtual tripwire, to generate events, alerts and any anomalies. Detections to alert the operator, behavior learning, using machine learning to learn to review patterns and predict behavior metadata and not just detection. Similar research for feature-based visual association and tools for video management and search. So based on this collection of microservices, we provide a set of reference applications or AI workflow to help you more easily see how we can combine these modular microservices or building blocks in different ways to create end-to-end applications that serve a wide range of use cases. And these applications are cloud native, well-defined REST API interface for you to integrate seamlessly with or within your existing application and solution, business services and app as well as your suite of sensors. Let's take a closer look at some of these microservices that I've mentioned earlier from perceptions to behavior analytics. So if we look at the detection microservices, we leverage DeepStream SDK to build this perception pipeline. So Deepstream SDK provides several hardware-accelerate blocks to process streaming video data to insight. So this can reduce pipeline latency and increase throughput. The input to this pipeline is an RTSP stream from cameras. And once the data is captured, firms are decoded, stored in memory, from there, the data is preprocessed and this could be scaling down the resolution before patching or sending it for inference. So this is a multistage AI pipeline where we first would do people detections using PeopleNet, our state of the art model for people detections that Adam mentioned earlier. It has been trained on a combination of real-world data sets and a massive corpus of synthetically data -- generic data set as well to ensure robustness and generalization. And then we track each object with a single camera within each single camera stream across frame using the NVDCF tracker. It's a highly accurate and high-performance object tracker also available in Deepstream. Finally, once the objects are tracked with a unique ID, we crop each object and then send it to our feature extraction model, people re-identification -- sorry, there's some issue with my audio. We send it to the feature extraction model, people re-identification to generate feature embeddings. This is also a model -- a state of the art model for people REID, tested on the public data set market 1501. And once the embedding are generated, we then generate the metadata. For each object in a frame, the metadata follow a well-defined scheme. And this consists of frame ID, sensor ID, timestamp, object bounding boxes, protection confidence, object ID from single camera tracking and feature embedding. And these data are then transmitted downstream microservices over a Kafka message broker. So at the high level, you can see that it's not trivial to create a high-performance and accurate real-time pipeline for Vision AI processing. And we provide this as a microservice that's ready to be deployed before it configure and integrate seamlessly to your application to help you with this part. But going beyond just perception, how do we generate rich behavior metadata from the detection stage? So the behavior analytic microservice that we provide receive a single-camera tracking metadata from a message broker and process them into behavior metadata and embedding metadata. A behavior can be considered as a track-led or trajectory, include the detected objects of the same object ID from single camera track-led. When the calibration information is available, the image pixels are converted to physical coordinates in order to project the objects to the location on a shared group plane. The REID embeddings are the same behavior and then summarize and normalize into just a handful of embeddings to present the entire behavior. This is primarily done to reduce the number of embeddings. For example, in a 30-second clip running at 30 frames per second, this would generate 900 embeddings, 1 per frame. So to efficiently store and retrieve, we run incremental clustering to summarize and normalize the embedding. This reduce space and also speed up search. So once again, you can see that it's not trivial to create an application or a small pipeline to do behavior analytics generation and processing. So we provide this microservice with the hope that it helps more developers easily go beyond perception and into higher order -- to integrate higher-order reasoning and analytics in their application. And if you want to go beyond processing and extracting insight from a single camera, but combining the field of view and the detection from multiple camera altogether to enable a full space awareness, we provide a multi-camera tracking microservice. So this includes several modules, including pixel-to-physical coordinate transformation. The objects are projected to the physical coordinate of shared ground plane using the provided calibration information just like the behavior analytics. So all cameras would have the same understanding of the physical space or the application would have the statement in spending. And the next stage is to preprocess and filter single camera trajectories. Then the behaviors are then filtered based on a mean of detection, confidence and bounding box size. The next step is aggregation and normalization of different embedding data points, behavior data points. And next is association based on spatial temporal cues. Synchronous trajectories, which are close to each other, spatially and temporarily are clustered. And finally, the behaviors are clustered with the REID appearing features that I mentioned earlier in the detection pipeline. This module does the final matching of the objects and assign a global ID to similar objects, so if the objects are similar, they have the same global ID. In addition, the output metadata also tracks all the behaviors from within each single camera tracking of the matched global ID. So using this microservice, a developer can easily generate a unique global ID from a collection of cameras for each detected object within single camera streams, and along with that, all the associate per-sensor behavior ID, behavior and detection data. So even though each microservice is already a pretty sophisticated, easy-to-use building blocks, to make it easier to see how one can integrate these building blocks into a new application or your existing application, we create -- we also create a set of AI workflows which are cloud-native, prepackage reference application to help accelerate your AI solution development. There's multi-camera tracking application in retail setting or in warehouse and logistics, occupancy analytics, how people move and occupy spaces within an environment. The movement patterns offset people and objects, and reference application on using these micro services for retail loss prevention use case. So multi-camera tracking at a high level, we integrate the detection and single camera tracking module as well as the multi-camera tracking and behavior analytics, along with a storage and retrieval infrastructure to help you easily detect and query objects, by example. So since Adam has shown the demo earlier, I won't go too deep into this demo, but essentially, we use a synthetically generated scene to demonstrate both the capability of the multi-camera tracking application as well as how you can use such an environment to train and generate the data that used for your application -- your models and your application in the first place. So again, it will serve both purposes, both in the development process as well as the deployment and validation process. Taking a deeper look at the architectures of the application. You can see that there's 2 main module to this application, the single-camera perception in multi-camera tracking and analytics. They are essentially composed of the Metropolis Microservices that I mentioned earlier, including the detection and tracking microservice, another one for video management and storage. There's a message broker that takes information from behavior analytics in multi-camera tracking as well as the Logstash, Elasticsearch, Kibana stack to help you store and process and visualize automated data as well as a microservice that present a web API interface for user to easily interact with the whole application, extract the right data, either to use assets or as the first step of further downstream processing in your own application and a reference web UI. Let me go to the next slide. And if we -- a next application that included in the Metropolis Microservices product is occupancy analytics. As you can see from left to right, you go from perception, detecting people in the scene to exploring the one level deeper of the analytics, how many people are in a scene at one point, the average velocity, directions. And the next stage is to generate the heat map of how people move and occupy the space over time to generate an even deeper level of insight in the space. And similar to the multi-camera tracking application earlier, this application follow a very similar structure. By just replace the multi-camera tracking module with a module called behavior -- or microservice co-behavior learning, that use clustering and machine learning to reveal and predict behavior patterns. You have essentially a new reference application with some similar and other entirely new set of capability with very similar application infrastructure. We are still reusing the video management infrastructure, the detection and tracking microservice and all the middleware infrastructure from SSbroker to datastore to web API and web UI. And something I didn't mention earlier that each component can be deployed either Edge or cloud. You probably want to deploy the sensor processing closer to the Edge where it manages a collection of cameras for latency, data privacy and data bandwidth reasons. And the behavior analytics can be also at the Edge that collects data from multiple single-camera perception deployment or multiple cluster of cameras. Or it can be deployed in the cloud because the metadata from each perception module are pretty light in weight and can be easily sent to the cloud for more scalable, distributed centralized processing. And yet another application is the retail loss prevention. So at a high level, the goal is to allow existing kiosks -- self-checkout kiosks to visually confirm product. So you need a typical kiosk, cameras, an AI system that can adapt during deployment using limited data of each class and often without retraining. Because in retail, the environment can be very dynamic. Data changes constantly, new products are introduced, along with new designs time over time. So you need something that can learn with limited retraining, or in another way, an application with few short learning capability. So how do we leverage the -- some of the micro services that I mentioned earlier to enable such a vision -- complex vision AI application that can extract visual features from the cameras and then combine those data with a barcode scanner to recognize the objects, evaluate when to trust the camera, when to trust the barcode scanner and perform appropriate correction to the application and then monitoring and store. So similar to the other 2 plication I've mentioned earlier, the demo, similar to the -- you go back to the slide. Okay. And then -- okay, let's look at -- quickly look at the demo and see how certain applications can be architected. So in this video, you can see that a person is scanning retail objects, alcohol pads, laundry detergent. To simulate this process we upload a video to simulate real-time cameras. Looking at the result, you can see that the visual detection system correctly identify the objects as alcohol pad because these objects are already available in the database. And the POS signal also indicate alcohol pads. So we've seen that we have 300 or so objects as alcohol pads and laundry detergent. So when users get new objects, the system initially fail to visually detect any of them and have to entirely depend on the barcode scanner cue. And by inserting a couple of objects, a couple of detections in the database and with their feature embeddings, you can use these as a reference data for detection in subsequent scenes when the systems see the same objects. So as you can see, the database now have 10 new feature embeddings of new objects. And then as you can see, it have successfully been able to detect the new objects. So that's the demo of how you can use sensors to reinforce each other. And the architecture for such an application also can be similar to the previous 2 applications I mentioned earlier. You still need a service to manage the incoming video, another one to detect and generate embedding for objects. It's actually the same deep stream pipeline earlier, the same microservice that we just switch out the models is something you can do for a microservice. It has the same message broker, but this time, instead of Kafka, it's Redis. We replaced the behavior analytics and behavioral learning microservice with similarity search microservice and a recognition evaluator of microservice and the rest of infrastructure is very similar. Next slide. So we plan to release the Metropolis Microservices end of this quarter. And as you can see, some of the key benefits is that it accelerates your application development for the whole process, in addition to the data -- synthetic data generation and model training with TAO and Omniverse that you've seen earlier, it provide reference application for advanced use cases like multi-camera tracking, retail loss prevention. And because every microservice and indeed the entire application is cloud native. It has been fully containerized and packaged with HELM charts. You can easily deploy from the edge to the cloud depending on your situation. And that's the end of my section. Now it's the Q&A.

Unknown Executive

executive
#4

Okay. Let's speak in with the Q&A section. We'd like to remind you that you can ask your questions during the Q&A window to the right of the slide presentation. Let's jump right into it. We're seeing a lot of questions on microservices. And the first question is, does multi-camera tracking track any other objects apart from humans?

Unknown Attendee

attendee
#5

Yes, multi-camera tracking can track any other type of objects. The algorithm is pretty much class agnostic. However, for the first release, we initially focus on people tracking use cases. So the -- as I've shown earlier, the Deepstream applications have 2 models, people net for people detections and people reidentification to generate the feature -- the visual feature embeddings of the person. And those are specific to people detection. So if you want to take a different class of objects like cars or any others, you just need to use your own custom models and then transmit the perception metadata to the multi-camera tracking. There might be component in the multi-camera tracking in terms of parameters or heuristics that might work better or specific to people detections, so there might be places where you have to configure a couple of parameters as well. But in general, it should work beyond people.

Unknown Executive

executive
#6

And expanding on that, will multi-camera tracking work for non-calibrated cameras that do not have overlapping fields of view?

Unknown Attendee

attendee
#7

Yes, so for cameras that has not calibrated, we would use the feature in beddings from the people appearances to track people across space and time. But if -- it will be ideal if you also calibrate the camera so that they all share the same physical space. So we do provide a camera calibration toolkit that provide a GUI interface to help you easily calibrate each cameras in order to map their pixel space to the physical space. Like I mentioned so all the cameras would share the same space, which would make tracking even more effective and other type of analytics much better.

Unknown Executive

executive
#8

And how can I get access to Metropolis Microservices? And is it free to use?

Unknown Attendee

attendee
#9

So Metropolis Microservices would enter the early access period around end of this quarter, and it will be free to use. So please check the product web page to sign up for when we have the early access available.

Unknown Executive

executive
#10

Right, and again, one other question on multi-camera tracking. So this is regarding the ID association. Does it only use spatial temporal cues or does it also use visual information of detections?

Unknown Attendee

attendee
#11

So it uses both. So one of the -- if you if you refer back to the perception pipeline, the -- it first detects the person and it generate embedding vectors of the -- of each person appearance. So it uses that to track people across cameras within each camera stream as well as across cameras. And in order to enhance tracking, we also use facial temporal association as well. So it's both. And the people reidentification features is what allows users to track across a very -- each person across very long time period. So a person can be disappeared from all the cameras for 15 minutes. But provided that they're wearing the same clothes or having the same appearances, the system should still be able to track that person.

Unknown Executive

executive
#12

All right. Just want to quickly introduce [ Aquil ], who leads Omniverse synthetic data generation marketing. Aqil, are you there?

Unknown Attendee

attendee
#13

Yes.

Unknown Executive

executive
#14

So Aquil, a couple of questions on synthetic data generation. So what 3D assets are available in Omniverse?

Unknown Attendee

attendee
#15

I'm sorry. Can you repeat the question, Debraj.

Unknown Executive

executive
#16

Yes, sure. What 3D assets are available in Omniverse?

Unknown Attendee

attendee
#17

Yes, absolutely. So Omniverse is a complete platform where it allows developers to build immersive 3D worlds or digital twins of factories. So to that extent, we're actually building -- with the help of our partners as well as organic efforts, we're building things like forklifts, desks, chairs. These are all readily available within the Omniverse library or essentially we call it exchange. The other way to get assets from outside of Omniverse is through areas like TurboSquid, which is one of the largest online depots for 3D assets. So once you go into TurboSquid, you like the asset and you actually find a section for Omniverse assets within TurboSquid as well, and you can download that directly into Omniverse. And once you have it, then you can modify those assets the way you want, but you can also add labeling. Specifically from a synthetic data-generation standpoint, you need a lot of the metadata that can go into training. So you need to be able to label the items properly, and that all can be done with Omniverse.

Unknown Executive

executive
#18

Right, there is a question about like using synthetic data -- a mix of synthetic data and real data. About synthetic data, is it good enough to go only with synthetic-generated data or should we also include real data as well? Is there any rule of thumb ratio here?

Unknown Attendee

attendee
#19

Yes. I don't think there's necessarily a strong rule of thumb. I think it's always a good idea to have real data. One of the methods that we've looked at internally and still, in many cases, we're betting it out with different use cases is the use of synthetic data to train the model first, have your ground truth real data to validate the model, but then, fine-tune that very model with the real data. So obviously, if your real data is, let's say, maybe only 100 images and your synthetic data is 5,000, that ratio is not big. So you could actually increase/decrease the number of real images, may perhaps increase the synthetic images to be able to get that ratio. So the goal is really -- at the end of the day, it's all determined by the model. And so typically, what you want to do, if you don't have any real data, very little data, you start off synthetic data. Then you can test on real data to see how far you're at. But then the second round, you can actually take the real data, perhaps even augment it, to then fine-tune. And when you do fine-tuning, you're actually not -- you're not doing the training from scratch, right? So you're just applying transfer learning, which then lets you get away with very real data at that point. So that is a method that we're seeing really good success with. But there are other ways to do it. If you can get your hands on more data -- more real data, that's great. But typically, you'd want to have the real data for validation, but also for testing at the end of the day before you deploy the model.

Unknown Executive

executive
#20

And what options do we have for the simulation output? Is it just still images or we have options for videos as well?

Unknown Attendee

attendee
#21

It is still static images for use for training now. That's purely from a synthetic data-generation standpoint. Typically, if you're looking to -- let's say, for a factory use case, you're looking to simulate the entire factory, that could be a video that comes out. But really, the output is really for presentation purposes, more about what data you want to train that. So within an Isaac Sim environment, everything is done with that simulation environment. But let's say, for a computer vision model, synthetic data can be exported out to then train that. So those typically are usually 2D images with -- either with segmentation mass or bounding boxes or you may just have a simple yes-no classification type model at the end of the day.

Unknown Executive

executive
#22

Perfect. Thanks, Aqil, thanks, Khao. Looks like those are all the questions we have. So thank you so much, all of you, for joining us for this webinar. An on-demand version of this webcast will be available within an hour after this event ends, and can be accessed using the same link. Thank you so much again. Have a great day. Bye-bye.

This call discussed

For developers and AI pipelines

Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.