NVIDIA Corporation (NVDA) Earnings Call Transcript & Summary
June 7, 2023
Earnings Call Speaker Segments
Hans Mosesmann
analystGood morning. Good afternoon, everybody. Thank you for joining us for the NVIDIA fire side. Before we get started, Simona Jankowski is going to read us disclosures, and then we'll jump in. Simona?
Simona Stefan Jankowski
executiveYes, thank you very much, Hans, for hosting us. I just wanted to quickly remind the audience that our comments today may contain forward-looking statements, and investors are advised to read our reports filed with the SEC for information that relates to the risks and uncertainties facing our business. Back over to you.
Hans Mosesmann
analystSimona, thanks. Well, we're delighted to have Ian Buck. It's been a couple of years. We couldn't talk last year. There were some conflicts, but Ian is a veteran. He's -- he runs everything that has to do with accelerated computing at NVIDIA, which is what the interest level is all about these days as it relates to AI. It includes all hardware, all software, all third-party enablement and marketing activities. Ian is known for being basically, I'll call them, father at CUDA, which has led to having a formidable moat around NVIDIA's business in terms of compiling technology, acceleration libraries, framework optimizations and so on. So there couldn't be a better person, I think, to talk about what's happening in the world of AI and how NVIDIA is playing its part in all of this. So Ian, welcome. How are you?
Ian Buck
executiveI'm pretty busy, as you can imagine. So yes, it's been an amazing journey and an amazing last few years since we saw it actually the last 6 months, even more exciting. Definitely riding that exponential. So we're cooking here.
Hans Mosesmann
analystGreat. How about if we just start. You've been at NVIDIA for 20 years, almost 20 years, and it's gotten even more intense in terms of what's going on. What's the state of AI today? I know that there's lots of discussions on ChatGPT, generative AI. Just briefly, what's the state of AI in terms of NVIDIA's view and how you guys are participating in? It's an open-ended question, but I think we'd start off with that, and we can go from there.
Ian Buck
executiveYes. Like I said, it's been a 20-year journey so far in accelerated computing as a whole. And the -- it started with making our GPUs more programmable, launching CUDA in 2006 and investing in that ecosystem since then. Internally within NVIDIA, building up a software foundation or platform for accelerated computing as well as working with everybody in the ecosystem, everybody in the ecosystem to enable GPUs as a computing platform. That, of course, is a broad goal. There were certain markets that in high-performance computing, simulation and others that adopted first, and that's been broadening since then. And of course, since 2012, AI, we didn't have AI and when AI found us. But because of that activation and making it everywhere and putting another GPU, those researchers up in Canada were able to find and realize that this thing they're working on their own networks with Hinton and others. Turns out the math was pretty good fit for what we're doing around CUDA. So it took off from there. There's been a couple of inflection points along the way for AI. The first one, which was basically the initial work by Alex Krizhevsky up in Canada was all back in 2012 in the initial ImageNet competition. That was the starting point. And really, that was AI for basic image recognition. What is this a picture of? A beach ball, a stop sign, a birthday party. That image recognition expanded into other forms of what is this a statement of. Are there images? A sentiment? So taking a review on our web page or a tweet and understanding if it's a positive or negative sentiment to understand and content. That described the initial -- that concept of AI, the use of AI was the initial production ramp that we saw. And we probably all remember the Jeff Dean talking about finding cats in videos as an example. And obviously, for the hyperscalers and the cloud providers and social media and the Internet needed to understand their content. It is the first place where they can really turn their data into AI to understand what is -- what people were posting, reviews, products, et cetera. The shift -- the next shift in AI, those became more and more capable. And then along the way, AI shifted from a recognition problem to a generative one, being able to not just understand content, be it text, speech, video, whatever, but be able to generate meaningful content to create content, to create a product description that would want users to click on a link or -- and that started small and really hit and inflecting with actually Bert, if you remember the original Bert model, which was the first transformer-based model. It was its ability to not just understand text but produce simple text statements and generate text. In fact NVIDIA was one of the -- we had a birthday, if you remember. In fact, it was -- got noticed even in the markets to identify this idea of this new kind of neural network called the transformer that could understand text. Prior to that, most applications they have were convolution-based. They were basically looking at neighborhoods of information and building up an understanding from localized data. This makes sense in image recognition to recognize the face. You first recognize individual shapes in certain places. My face has 2 circles, a nose that aligned for mouth. And you build up the notion of a face, which is localized. Language is different. Language has all sorts of indirections. What I am speaking about right now is filled with pronouns and context that is only known for other parts of the text or speech that are far away from what I'm saying right now to understand it. Transformers were based around its idea of attention, figuring out those distant relationships and incorporating them into a neural network. It started with Bert, invented by Google and then took off from there. And we all heard of GPT. The T in GPT is transformer. It's the same idea. NVIDIA was obviously swarming convolutions, image recognition, CNNs. Now we focus -- we still do that and is still a growing use case is an important one. But now transformers have taken over, including in some video, to understand distance relationships and the primary use case for that speech and human understanding. Speech and language are a hard problem. If you think about computer vision, like dogs, cats, even bugs could do basic computer vision from a brand perspective. It's a -- you can find highly to do a reasonable job. And only really humans have the gift of language, and it's built upon the deep understanding of knowledge. That was the first -- that was the next inflection point, transformers, which is our sort of understanding knowledge and being able to connect knowledge with language, still mostly understanding a little bit of generation. Today, we're in the generative AI. It started with -- in 2 areas kicked it off. One is obviously image generation. Being able to describe it like a picture of a teddy bear swimming in a big lab, and it generates a picture of that. Generative AI or ChatGPT, being able to have a conversation, understanding what I'm saying and repeating back and extracting information. There's no database back there. It's one large neural network. And with generative AI, we not only can make -- we've moved from an era of recognition or recognition only to -- which is important. I can pick that data and understand what my content is and make decisions based on what AI is informing me. But now AI itself can provide the content, provide review, provide texts, engage with customers, generating which help artists optimize business, build new applications, build new kinds of services. What's interesting also, though, unlike other revolutions of -- like in the PC space or the mobile space, you guys all have seen the new kinds of applications, new kinds of platforms, new kinds of software. This one, generative AI is actually making the old stuff more interesting. You look at Office 365, nothing could be -- I should -- probably doesn't like me saying this. But Mark stuff, this is the more Excel isn't that interesting words. It wasn't that interesting. But with generative AI, wow, it's way interesting again. So in this revolution, we're seeing generative AI create new start-ups, new kinds of services. But it's also making the old stuff super interesting again too, which is a fun double exponential. So that's where we're at. I think we're really at that cusp of the really beginning of that generative AI. Everyone sees the opportunities you guys do. The market does the VC community, investor community does. And you see the amazing start ups that are being created. And it's -- that's what makes this super fun right now, is seeing all the different applications of the new services and old that are getting amplified and changed with AI.
Hans Mosesmann
analystIan, the compute, I think a lot of people in the industry or observers talk about AI and the parameter complexity doubling every 3, 4, 5 months. What does that -- how much longer do we have for that? And what are the compute implications for NVIDIA and for the industry going to make use or custom ASICs or even at...
Ian Buck
executiveYes. That's a great question. It's one I get asked a lot. So first off, to do generative AI, you have to -- the AI has to have knowledge, and not like access to knowledge. Certainly having access to knowledge like a database, like a courier -- pull from provide as important, and most do. But the reason why GPT is so big or the Megatron-530 billion model that we trained on our supercomputer is so large, it has to capture humanology at some level in order to be a starting point for a generative model. So that drives the model size up. The other thing that drives up that it's not just model size -- well, first off, model size tends to limit the bigger the model longer takes to train, in fact. And people build models to limit that's practical. So they don't want to wait. And it's not just one train job. To build a model at that scale, you are constantly iterating on that model, on the data, on the tuning of the parameters to get to converge to a level of intelligence. There's a lot of AI, the training jobs that don't complete but informed to inform the next one to next one. So it takes many months to years to build a truly intelligent model. So the final change is an eventual convergence of that effort. The other thing that drives -- the challenge tends to be training time. Nobody wants to train more than most a month or 2. I think it's -- you go past that. The productivity of the -- just it's hard to innovate if you're waiting that long. So the size of the model tends to be a factor of how much capacity they can put in place and how productive it is at scale. But there's been a general rule of thumb. Like DL researchers, the ones really developing the stuff, you don't start building foundation of models, don't really want to train -- have train jobs that take more than a month, because they're just too impatient. So as we make faster GPUs, as we figure out how to connect them faster together with InfiniBand, to build more optimized infrastructure, to do things like Grace Hopper and the new DGX on GH-200, their productivity increases. What they can train in roughly a month, model substantially must get bigger because it gets more intelligent. I will say one other thing, though, that model size is only one metric, and model size is just measured in parameters. $175 billion is typical for GPT. People we've trained 500. There's 1 trillion perimeter models buying closed doors. They're starting to get a little more secretive and not releasing these huge models, which obviously are pretrained and asset intelligence. The other thing that's driving is model -- the design of the layers. We're continually tuning the intelligence at each layer, making them more optimized, more clever at each layer, which increases the complexity per layer. That isn't always captured in parameters because each layer has a bunch of math and calculations in it instead of just being a naive connections, if you will. Human brains, by the way, similar. We have different kinds of neurons for vision processing versus auditory versus memory. So we specialize in the layer in the design. The same as happens in AI. The other one is sequence length. So I don't know if you guys have noticed. But the long you play with something like ChatGPT, you can get it to forget the previous conversation, and it will drift. And that's a function of sequence length. How much of the information of the conversation we can keep in its store and its memory is having a conversation. Sequence length increases compute size significantly in terms of -- it's also both the training and inference load, which isn't always captured in billions of primers. It's just how much needs to be processed in order to make an informed conversation moving forward. And then there's more diversification going on. We have lots of different models from PaLM to, LaMDA to GPT 4. So we see different specialization happening. I expect it with moving forward, the models will naturally want to get bigger because they can encapsulate more intelligence. I believe that they -- we are definitely seeing that happen. We're seeing them integrate more deeply with intelligence databases and applying AI into the database and information people itself. Vector databases is super interesting. I can go -- I can talk forever about that, but those are being tied directly to some of these large models. And now AI is working into the refuel systems as well to inform them. They get their specialization happening. There's -- so we're seeing multiple different models and specialization of diff layers and sequence length to keep the conversation more intelligent and keep the AI working memory more adept, which is also significantly increasing compute requirements. It's a chicken and the egg and the salt, which we're trying to help with. I think it's what's driving -- every time, we launch a new architecture, a new interconnect technology or do new innovative things in Grace Hopper, DGX GH-200, we expand the scope of what these researchers and developers and NVIDIA's own researchers can do in order to move large language models and generative AI forward. The next chapter in that probably will be more about reasoning. What's interesting is we're seeing -- right now, we're in generative AI. And I can talk more about reasoning in the future, but that's kind of a where we're going, and that's an even harder blue sky problem.
Hans Mosesmann
analystWell, okay. So it looks like we're going to be in a growth pattern here for some time. For those that are listening, investors, participants, if you'd like to ask a question, just click on the question or Ask a Question button on the right of your screen. And it will come to me, and I can read it out for Ian to talk about. There's a new metric. It's kind of interesting. I was talking to some contacts in Silicon Valley in maybe 6 months ago or so. The price of Hopper and the DGX Hopper was starting to come out. And it was really, really expensive. And there are some people who are saying, "There's no way we're going to pay that kind of price for this kind of system. It's so much more expensive than, say, ampere." And yet here we are, and you're probably hand-to-mouth for the better part of this year, which kind of brings to mind that the issue for some of these AI models for training enter inference have little to do with the upfront client. So it's less relevant in the really the TCO aspect or the efficiency aspect that comes into play. How does that determine how you come to market, how you architect your compute GPUs and so on?
Ian Buck
executiveYes. It's -- I really appreciate that question, and it's one that gets asked a lot because this entire community -- the world sees pricing and see sticker shock. And by the way, they usually -- they don't realize what it takes to build a hyperscale data center and the cost that goes. These are multibillion-dollar investments that are not new, people are building data centers at scale, and I get to work with all of the hyperscalers about that. So the productivity and the utility of compute is incredibly important to them in order for them to improve their service, improve what they're doing, optimize their business and increase their revenue. Compute is critical to add generation, putting the right content in front of you, keeping those engagement scores high or keeping the products you want and that -- to provide a service. Nothing is more annoying than getting useless ads, but getting ones that actually are the things you want and the information you need leads to revenues. It's critical. And while people can see the -- maybe see the cost of interest of a GPU, we create the opportunity for all of them to invest and build those services to turn -- make turn AI into the opportunity that it provides for them based on computing on their data. Specifically though, when we talk about generation and generation, how we think about introducing new GPU, new technology in the market, it is TCO. It is about are we -- how are we revolutionizing not just the compute capability, but also the TCO analysis of what you can do today with our existing products and tomorrow with the next one. Hopper provides 6x more compute performance at the transformer level, implementing that transformer layer than Ampere do 6x. End-to-end, it's delivering -- end-to-end on training, it's delivering 3 to 4x more performance. That's a complete training job that drops its throughput. And inference, even more. Inference switches can be further optimized. And then, of course, so when we think about that, we think -- and when we look at that more than just the cost of individual know, but what's the throughput of that entire data center is going to be for the basing up what they have today and what they're going to be able to do tomorrow. And we save them a ton of money. We save them a ton of money because by transitioning from one generation to the next because the opportunity of performance, and the economic TCO is hugely in their favor in terms of the throughput of the data center and the productivity in the data center. That $1 billion investments, the billions that takes to build those data centers all around the world. That same story plays out in enterprise as well. So by moving workloads from CPU or from previous generation GPUs to new GPUs. The throughput of the system or the rec or the data center at data center scale is measured in X factors often. Certainly for the model, the transformer-based models, but including -- we also look at the breadth of all the different workloads, including the models that are representative and image recognition benchmarks you see in MLPerf, for example. MLPerf, if you guys haven't heard of it, it's a benchmark. It's created by Google to sort of provide a level of playing field, a clean, clear benchmark. It was representative of their training workloads. And since then, Meta has also been contributing their workloads to provide an honest benchmark that trains us to the correct level of accuracy or convergence as a requirement. And we use that to measure our performance to market based on previous generation. And you can see what Hopper has done compared to Ampere. The other interesting point is that once we don't stop after, we ship it. We continuously invest in the software and optimizations. We -- software is a massive part of what we do. I myself, started as a software engineering manager in NVIDIA doing CUDA, I have hired thousands of software engineers and others across the company. And one of the reasons I had this job is because now is because of the importance of software that what we do. And it is our interface to the rest of the world to the people that are consuming our technology, partnering on the frameworks like PyTorch and JAX and TensorFlow and everything else, the rest of the system and the end user community. So NVIDIA at this point, has more software engineers than hardware engineers by a good margin. And so after we do the first round of benchmarking on something like Hopper, we continuously improve it. And in fact, the Ampere over its life, I believe, 2.5, 3x faster from -- yes, from the first time -- you look at the first time we submitted to the MLPerf benchmark as public to where we -- I think we've recently estimating membership to have shift over Hopper. You can see a 2.5x improvement in some of those models and the use cases. So I mean that's kind of what our users experience. I think it's why we have such a loyal community of users, both in the developer community as well as our biggest customers. It's because we're continuously optimizing the whole stack and the platform, along with them, to improve the TCO.
Hans Mosesmann
analystGreat. Ian, I did get a question here. It's an interesting one. Can you expand on the current issues with scaling sequence length and how that might be solved? There seems to be a push for new architectures that have more favorable scaling functions. Would this be a risk or opportunity for NVIDIA's advantage with its Transformer engine?
Ian Buck
executiveYes, good question. So let me elaborate a little bit on that. The -- we want to -- working with the customers and the users in the community, you can take a relatively small or a large model. And the larger input sequence length provides more context for the conversation moving forward. As mentioned in tokens, started with hundreds, not going to thousands, and they want to push it up higher. That increases the compute complexity of the inference job and also -- and how you want to tune for training. Scalability is really important there. Also capacity is important. It creates a larger working memory with a larger model. And one of the ways we -- there's multiple ways to address that. One is scaling. Obviously, the throughput -- so well, there's multiple ways to optimize. First is transformer engine. You mentioned that. What Hopper did that was so revolutionary, it was so impactful, was it made something called FP8. FP8 it is an 8-bit floating point representation. It's basically is 8 0s and 1s to represent a floating point number. It's not a lot of information. It's about the number of characters in an alphanumeric keyboard, for example, times 2. So every character you can type on per character. And roughly double that, that's how you represent in bits. But if you can make up training work at FP8, it's incredibly fast. Obviously, computing on 8 bits is faster than computing on 16. Also the memory size is half of what you would have a 16-bit starting point, which is what we have before, whichever we were using. The transformer engine specifically designed. You can't just put down and expect to cut the number of information. Eightfold, it's exponential eightfold in order to things to train successfully. Transformer engine with Hopper is actually a combination of both hardware and software to make sure that transformer models can train to convergence with only that 8 bits and information at the core computing unit. And it's a ton of work to make that actually consume a massive amount of our own supercomputing capability to make that work, to understand, to figure out how to keep things within the range of those 8 bits. I mentioned that for Sequence length, because by doing so, we reduced the size of the model, the size of the working set. It can a fit more in a 96 or 94, 80-gig GPU depending on your flavor with the available Hopper. And of course, it keeps the response times, how fast it can respond to a question within a range of usability. After that, we scale. So we scale if we need more GPU compute in order to expand further, we scale with NVLink. So we have technology on the NVLink, which allows. It's on Hopper, it's 900 gigabytes a second, which is a lot. It's roughly 7x. I think more than what you get with like PCIe if we try to use the standard PCIe of connecting devices inside the system to basically combine 2 GPUs together into one. So we'll split the model and actually execute the model in parallel across 2 GPUs, you need that much bandwidth between the GPUs in order to keep things going, to keep things -- to make allowable GPUs to operate as one to split the model and keep the latency response and develop. If you need more, we can go from 2 GPUs with an NVIDIA with H-100 NVIDIA product which is actually 2 PCA cards on the bridge to 8 way. So we have an HGX 8-way system that can go across 8. And then beyond that, we can use their tricks to use InfiniBand. Or we can go all the way to Ranger, our DGX GH-200, which is our 256 GPUs all connected together like I just announced that in Computex 2 weeks ago. The other thing is size of model. So how can we do bigger models? Even if we don't need the latency or do some smaller models with longer sequence length, but could be served with a single Hopper in terms of performance. For that, we have Grace Hopper. Grace Hopper is our -- we've announced that we've been talking about that in our GTC. If you haven't seen our GTC conference, you should check it out. Grace Hopper basically is a 600 gigabyte GPU. So we combine the GPU, which has upwards of 96 gigabytes of HBM memory. And then on CPU, moving together with the NVLink again so that the GPU can take advantage of all of the CPU memory, which operates upwards of 500, 600 gigabytes a second. So we can now have effectively a 600-gigabyte GPU, and that also helps with these doing larger sequence lengths. There are lots of ways to -- and its actually -- I pity your community to do all this analysis. It's becoming a complex matrix of model size, latency requirements and sequence length. And we're blanketing the space. So we have -- and that's why we're creating -- PCI creating so many different variants, some different products of Hopper in a PCIe form factor, Hopper NVLink HGX-form factor, 2 PCIe bridge together. And now Grace Hopper as all of which can be used to deploy inference at scale.
Hans Mosesmann
analystWow, that's a big answer.
Ian Buck
executiveYes, I apologize. This is what I do every day and making sure that working with each of those hyperscalers -- and those start-ups and everyone else to dial it in and create new products to address it.
Hans Mosesmann
analystSo it looks like because you're blanketing the market with various types of products, it can counter some of the different proprietary or new architectures that have emerged and out there that are being proposed. Is that kind of like what you're saying?
Ian Buck
executiveYes. There's multiple -- there's not one click anymore in NVIDIA's road map. I think that's kind of how it -- it used to be -- here's our Pascal GPU. And 3 years later, there's Vault. And 3 years, there's Ampere, 200, 200, 200. What NVIDIA has been working on is diversifying the ways in which we can add value. And instead of now we build CPUs, GPUs, DPUs. We work on InfiniBand and Ethernet and make both of those platforms, AI capable for different paths. And then we can play with and optimize how we can connect all these things together and build different products within -- even within one GPU traditional generation and meet the demand wherever it wants to go based on where it is going. So that agility is really important in AI. Things are being invented all the time. And NVIDIA, being sort of one AI company that works with every AI company, that's why you're seeing these products. It's because we can -- we're meeting different -- seeing different aspects of what people need to be able to dial-in and bring to market, perhaps at the peril of our partners who have this now. We're trying to meet all those demand to meet -- to optimize that workload. The other thing I'll say is that we've also accelerated GPU road map. So we used to do GPU, 100 class GPUs every 3 years. We're now down to 2 years. In some case, 18-month cycle. Jensen has talked about Hopper Next and that time line. And in addition, it's time for Grace Next, it's time with Quantum Next for Interconnect, and we've accelerated that now. So we're now doing -- enable to invest and our GPU that -- now every 2 years or 18 months, depending on how you look at it.
Hans Mosesmann
analystWell, that's good to know. We got a bunch of questions that have just come in. We don't have a lot of time. This is a tactical one. I'm not sure if you can answer. Maybe Simona can come in. Just can you please talk about efforts to source supply for the second half of the year? And how does NVIDIA define significant, as mentioned in the latest conference call?
Ian Buck
executiveYes, Simona can comment a little bit more on the conference call details and I can follow.
Simona Stefan Jankowski
executiveSure. Happy to do so. I hope you guys can hear me okay. So we talked to it on the earnings call that we are going to have substantially higher supply in the second half relative to the first half of the year, and that essentially backs up the extended demand visibility that we see stretching out a few quarters into year-end as well. As we commented, we have seen a pretty steep increase in demand through the quarter all the way leading up to the current time, and so we're working closely with customers to ensure that we have supply for them. That also helped underpin the strong guidance we gave for the second quarter. And then even with that higher baseline in the second quarter, we commented on a substantially higher level of supply second half versus first half. We haven't been more granular on the exact linearity between Q3 and Q4. So just give us a bit of time as we get closer to the back half of the year, we'll be able to provide guidance quarter-by-quarter.
Ian Buck
executiveOkay. That's pretty much -- I don't have much more to add to that. We are certainly -- our biggest customers, of course, playing with us. And everyone is a Swarm in generative AI. But as part of that, we are able -- we can -- we are working, of course, doing that planning with them and continue to do that with them as we're doing all things that Simona mentioned at the same time.
Hans Mosesmann
analystOkay. Here's another question along the same lines, maybe you can answer this. What is the biggest bottleneck for NVIDIA or GPUs more broadly? How much time do you think it will take for the industry to build a sufficient inventory or supply levels?
Ian Buck
executiveThe -- well, I'm not going to comment on specific bottlenecks from a supply standpoint. I think the challenge or bottleneck perhaps in further adoption, it's not really a bottleneck, it's just where is it going, is the broadening -- we're seeing now the enterprises pick up AI. And for that to happen, the people, the providers of AI, including NVIDIA, need to meet the enterprises where they are. In some of them, they are -- they have their own AI expertise. They've either through acquisition or hired or brought in-house or working with closely with a startup or others to adopt AI to their -- to influence or improve their business. And you see that in some of the large language model startups, for example, providing that. Really like to work in depth that AI is doing, for example, making it easier to use the older software with AI that can click all the buttons and check all the boxes instead of having on existing software. It's the great way of doing those things. But also meeting the needs of those enterprises where they are in terms of -- perhaps a service or taking a pretrained model and fine-tuning it to a something useful, where really the only thing the enterprise needs to do is provide the right kind of data and take and convert a pretrained model into their own virtual system or chat capability. So a lot of the activity right now is about helping them adopt AI into their workflows, into their products and some of the work we've been doing on our own NeMo product, for example, is exactly that, where it's tracking that easy to use, you can provide a few hundred up to maybe 1,000 or 2,000 example, text-in text-out, and it can fine-tune the GPT model all the way up to 175 or larger to answer questions in that format in that context. Instead of just asking a generic ChatGPT a question, which gets you a generic answer from a generic human or generic how the Internet would answer it, you can have a -- you can answer it like a financial expert or a support call expert or other such things. Connecting AI with information retrieval systems. So when you ask a question, you don't just get an answer, which may or may not be right. And certainly, it can -- we can all make ChatGPT lie or a write code that actually looks right, but it's some things made up, but actually get to the actual source of the information. You see that a little bit with the work that Bing is doing. But more broadly, generative AI is useful if you can only generate an answer but tell you where the sources that we're seeing and you can do further explore their results. So that democratization in -- is the -- in connecting all those GPUs with industry is a big push right now, and you're starting to see the early movers in that space. That's where that next wave of GPU usage and also revenue from services and things like our DGX Cloud efforts as well as our partners is going to -- is moving the needle.
Hans Mosesmann
analystLast question. We got a minute. Let's see if we can keep it a minute. How are the conversations with biggest clients, hyperscalers or enterprise changed after your last earnings call? Because it seems from what we hear that this was a real wake-up call for many decision-makers on how to make serious investments on AI. So the question is, how quickly has this changed since the conference call, which is basically 2, 3 weeks ago?
Ian Buck
executiveI don't know there's changed from the conference call. It certainly changed in the ChatGPT moment and the generative AI moment. And the -- probably is only -- continues to be amplified with activities you're seeing in the street. But the opportunity for generative AI in every one of their services and every one of their capabilities, the -- seeing NVIDIA not as a supplier of GPUs, which we are or a supplier of infrastructure, but a partner at many levels. We always were a partner in their hyperscale efforts, the development of their servers, the design of their data centers, how to build something even more capable and optimized and power-efficient and at scale. Every one of them has their unique challenges and capabilities and their own technology that they can contribute to -- and working with NVIDIA to make it work well. Amazon's EFA and elastic adapter. And we've done a lot of work to make sure that, that can work at scale. Other hyperscalers that have their own or should they use InfiniBand, they're working with them to scale out InfiniBand or their ethernet platforms. We've always been a partner on the data center partner, that's only amplified our links. Now it's about Grace and Grace Hopper and CPU land and what we can do in that space, which is very exciting. Step up was the software side. So all the capabilities in the software and the infrastructure and integrating into all the different frameworks and the core capabilities, broadening across all their services, so all their developers and researchers can get access to that infrastructure and meet them, that's only amplified. We always were a partner with them with PyTorch and TensorFlow and JAX and others. That certainly has continued and growing. What we're seeing now is more and more of their service groups seeing NVIDIA as also a partner to optimize their -- for the latest platform or what we have to offer in generative AI. Seeing the opportunity to do more with less, moving workloads that were even still on CPU for using a little bit of AI or simplified AI to use them much more intelligent larger models, to improve the quality of service, to have a better interaction with the device, even if it's talking to a hockey park on your kitchen counter or the cloud. And then -- but we're also a partner with them to optimize those models. That 2.5, 3x that we did with the A100 wasn't just us working in the back office, it was the optimizations we were doing because our customers -- our biggest customers were giving us challenges in side-by-side working with them to optimize those workloads, which get obviously reflected in things like benchmarks and elsewhere. So that is amplified. And certainly, we have our engagements with their service teams deploying AI, figuring out how to run -- use Hopper, use Hopper at scale, do inference better and more efficiently and move more of the workloads to the GPU structure they have and plan for growth moving forward is a big part of -- that definitely has gone up quite a few takes.
Hans Mosesmann
analystWe can imagine. Well, Ian, thank you so much, very enlightening. It looks like you're going to be hiring another 1,000 software engineers. Hopefully, you don't have to do all the interviews on yourself. But exciting times. Simona, thank you as well. And we look forward to the group session that later this afternoon. Have a great day, and thanks.
Ian Buck
executiveThank you. And hopefully, see you again in-person.
Hans Mosesmann
analystYou got it.
Simona Stefan Jankowski
executiveThank you. Bye-bye.
Ian Buck
executiveBye, Hans.
This call discussed
For developers and AI pipelines
Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.