Arista Networks, Inc. (ANET) Earnings Call Transcript & Summary
June 5, 2024
Earnings Call Speaker Segments
Operator
operatorPlease welcome to the stage, Chantelle Breithaupt.
Chantelle Breithaupt
executiveGood afternoon, and a very, very warm welcome. Thank you for being us. Thank you for those in the room here at this beautiful building at the New York Stock Exchange and for those joining us, a warm welcome to you. We're super grateful that you're here to help us celebrate our 10-year anniversary in the IPO for a record. I've only been here 4 months and I'm super excited to be here. I imagine those who have known the company much longer how this feels for both the team and you as a community enjoying us. I wish you a great next 2 hours, in a sense we have lots of great speakers, events and things to show and share with you to celebrate. And with that, let's get started by introducing someone who needs no introduction, who's one of my favorite people to work with and deserves all the accolades. Our CEO and Chairperson, Jayshree Ullal.
Jayshree Ullal
executiveGood afternoon, everyone. First of all, I want to just thank you all for celebrating with us. Today is less about our talks and more about the celebration. And I'm really touched when I look around the room and see so many of you who have affected Arista in a variety of ways -- Arista stock in a variety of ways. And although we are here to celebrate the first 10 years, I couldn't be more excited about the next decade ahead of us. Before I go much further on, I just want to acknowledge a few people in the room. Every one of you have played a special role, but especially to some of my mentors who played a role early on, Joe Tucci, former CEO of EMC is here with us. Hock Tan, Hock will be speaking. He's a great partner of us and a special guest today. And for those of you who think I don't have a boss, I actually have many bosses. So, 3 of them are here today, Lewis Chew, Yvonne, and Mark Templeton. And Nikos was my former boss, so my Board members, thank you for coming. So I'm going to kick it off. Of course, no presentation would be complete without using the word AI. And I don't know if you saw last week that obviously, Arista has been a market share leader of data centers. But today, we think it's not just the data center and the front-end network, but increasingly, the cohesive capabilities of integrating both the back-end with AI and the front-end with data centers into what we call the AI Center. So I'm going to kick it off with that, and then I'll have my luminary founders, both Andy Bechtolsheim and Ken Duda join me in giving more details. But first, before I do that, wow, when I look at this, I still remember when Andy and I and our CFO went on the IPO Roadshow. They were a little shocked that, A, we were doing this kind of revenue, $500 million at IPO, and even more shocked that we had profitability. Profitability was not in vogue back then. I think 10 years later, it's very much in vogue, and thank you all, guys, for supporting this journey. You can see here, we started at 2 million ports, and today, we're here to celebrate not 2 million ports, but 100 million ports after 10 years. Not a few thousand customers, but 10,000 customers. So, 10 years, and later, we have achieved 10,000 customers and 100 million ports, and this is a huge milestone, and obviously, we've gone well beyond $5 billion. So as we march to the next milestone, which will probably be 10 million, I want to say a lot of it is characterized by this foundation we have laid in the last 10 years. Well, today, we'll spend a lot of time talking about networking for AI. And we're going to share with you, we just announced today our AI 800-gig platforms based on the Broadcom technology called Etherlink. We also have a huge focus on AI for networking. So networking for AI is when you're running the AI applications, and the infrastructure needs to have the right behavior to deal with the scale of GPUs, the accelerators, and really build a cluster that's reliable with the right congestion control mechanisms, load balancing, packet spraying, but equally important is the predictive analytics, the observability and security using our AVA sensors for AI for networking. So these 2 sort of go hand-in-hand in building that holistic AI capabilities. Today's networks, though, are really built, and this is where Arista participates, very much in the green section where we build the front end with a highly scalable Ethernet-network. The back end, which is GPUs, is typically still InfiniBand. You all are familiar with this technology. It's been around through the InfiniBand Trader Association. It's a vendor of one. And it's very much an HPC cluster that's now turned into an AI cluster. Where we believe the world is heading, though, is with all of these massive training clusters leading to workloads that are anywhere from 1 billion to 1 trillion parameters, Ethernet's going to become the back end again. And nobody knows this better than Arista. We have reinvented Ethernet at least 3x or 4x. We started our humble beginnings with high frequency trading, low latency, 600 nanoseconds, going down to 150 nanoseconds where we had to stretch Ethernet to achieve those kind of goals. And then we scaled some of the world's largest cloud providers, building 100,000 servers and enabling cloud scale switching, again, stretching the boundaries of Ethernet with an active-active MLAG or ECMP technology. In the third incarnation, we added routing, once again, pushing Ethernet to boundaries where you didn't anymore need a separate router, but leveraging our partners with merchant silicon, we could now scale routing from a mere 256K initially to millions of addresses. And now today is, I think, the next phase of Ethernet, pushing the boundaries of AI to really make Ethernet once again capable. So, in the old days, we talked about much more of a perimeter and protocol focus for traditional networking. Now we talk much more about distributed computing and the ability, therefore, to handle all of these data-driven AI centers. And this is the power, I think, of our next decade and how we approach it. This morning, we introduced some major products. We introduced an AI leaf running at 51T based on the Broadcom Tomahawk 5 chipset, going all the way to 460T with the J3 AI chipset, and you can see here we have AI leafs that are fixed, AI spines, and we can address a range of non-blocking density. AI is extremely data and compute intensive. You're constantly sending more and more compute, exchange, and reduction, and we're doing this with the lowest power, the least complexity, and the best fan-out possible. We're also able to now build the leaf-spine topology you all know and love from the cloud side for the AI side, where we can have a multi-pedibit bisectional bandwidth and enrich this with a huge bunch of load balancing capabilities, explicit throttling capabilities like PFC and ECN, now scaling to not just 1,000 GPUs, but going all the way to 10,000 and even 100,000 GPUs. And finally, today we introduced a brand-new product with a single Etherlink connect called the Arista 7700. This is the first time we would not just introduce a leaf and a spine, but a single-stage scheduled product that can be 100% efficient and then, therefore, build even large clusters with multi-tiers. So this product is very unique to Arista, again leveraging our partners, and the beauty of this 3-stage family, an Arista Etherlink leaf, an Arista Etherlink spine, and now a single Etherlink, distributed Etherlink switch, is we can build multiple permutations and combinations of 800-gig capability on the back end. And at the same time, we are GPU agnostic. Today it's obviously largely NVIDIA, but you'll see more of them come from AMD and Intel. We're NIC agnostic, we're going to be working with all the NICs available, including the BlueField and CX, and then the Broadcom Thor, and AMD and Intel in the future, and enabling -- leveraging once again our EOS, a suite of smart features that run on these platforms. Now why is Etherlink so powerful? Well, traditional Ethernet allows you to achieve certain peak performance over a wide range of packet sizes. By adding all of these optimizations at extremely high speed, we can now gain more than a 65% improvement on Ethernet with Etherlink. And this allows us to outperform not only InfiniBand technologies, but even familiar Ethernet technologies. There's no requirement for twiddling agents or NICs or anything. This is all done on the network, and it can work with any GPU and any NIC. This kind of high bandwidth, high fairness, rapid recovery and delivery is really important for job completion time in an AI network. And this is why we made these improvements and we introduced this product line today. And you'll hear more from Andy on this as well. Unquestionably we can build many permutations and combinations of modular and fixed designs, and you can see here that you can have a single tier and connect as much as 1,000 ports, or you can go to a 2-tier and connect a greater layer of density and ports. The Arista AI Center vision that goes beyond our switches. We will be working with an ecosystem of NICs. You're going to see a demo from Simon and the team today of how we can enable that EOS agent not only on the NIC, but through the switch gain incredible visibility of what's going on. One of the dangers of high performance, high speed networking is packets are moving so fast and you're getting so much bursts of activity that the visibility is as important as the performance. And this is something that we've been paying attention to. That doesn't mean we build everything, but we make sure we can cohesively work in a homogeneous way across these things. And finally, I'd like to say, with the addition of all these AI products, organically we believe our TAM is increasing. Last year, I think we shared with you in November, approximately $60 billion TAM. We think now by 2028, we're close to a $70 billion TAM. And a lot of work to do in executing on all of these fronts, including AI, but I want to, from the bottom of my heart, spend a little less time on technology and thank you for the first 10 years and all of your support, and thank you for the future support you'll give us for the next 10 years. We're going to be having a lot of short talks here, and then eventually we're going to cut a cake. That's a real cake, by the way. It looks like cardboard over there, but it's very edible. And so with that, I'd like to send it over to my friend and colleague and founder of the company and Chief Architect. Oh, is it you or is it -- okay, sorry, sorry, Quentin. Are you going to -- all right, so [ Quentin ], over to you to be the master of ceremonies and introduce, and then we'll invite Andy over. Come on down. Thank you.
Operator
operatorLadies and gentlemen, [ Quentin Hardy ].
Unknown Executive
executiveOur humble beginnings started in high frequency trading. I really like that. That's a good place to start and grow from. Welcome to a celebration of honoring Arista's 10 years as a public company, and also honoring Arista's past as a prelude to the future. You just saw a little bit of it right there. I want to begin by sticking my neck out and guessing that a couple of people in this room have had a buy on this stock at some time in the past 10 years. Is that right? Anybody? Okay, good. Thank you. Because I want to thank you for being smart. What you saw was 10 years, and even before the IPO, Arista being able to execute on a definition of the job they had to do, on their technology approach, on network engineering excellence from interacting at the chip level to the highest levels of the networking stack, on choosing what to do and equally important, what not to do, on sustained focus, and most of all, on customer commitment. It's a key part of the culture here. So much so that today, many Arista customers have come together to offer their appreciation and congratulations, which we can now see a part of in a video that I'd like the AV people to run now. Let's take a look. [Presentation]
Unknown Executive
executiveAnd thousands more customers feel the same way. I know that matters more to the team at Arista than anything else. But I also know that at Arista, there is no celebration of past achievements without a redoubled commitment to the future. And today, the future is particularly exciting. As Jayshree was just talking about, AI is an extraordinary field that presents business opportunities, technical challenges, new ways to architect our infrastructure. Could be in design, could be in provisioning and scaling, observability, QoS, troubleshooting, security or all of the above, as Jayshree wrote about in her blog last week, the new era of AI centers. If you haven't read it yet, you should. It's very exciting stuff about harmonizing compute in a network all the way to agents on NICs and GPUs. This morning, I was talking to her about it. And she said, well, it was so easy to write that. Sometimes it just flows out, which indicates to me you'll be hearing much, much more from Arista and from the CEO about what this means. But even before that, Arista would like to offer you a brief talk from Arista's Co-Founder and well-known tech visionary, Andy Bechtolsheim, on the big picture, the future of AI. Andy?
Andreas Bechtolsheim
executiveWell, first of all, thank you very much for joining us on this occasion of our 10-year anniversary of a public company. Now, to jump right in here, AI has been all over the news. I'm not telling you anything new here. But keep in mind that the size of the models people have been training on has been built dramatically over the last many years, and there's no signs of slowing down. And this is before we're getting to these new multimodal models, which will combine text, images, and video from the real world. Now, the bottom line is customers want much, much, much larger access, because basically the training time on these large models is way too long. It's measured in like 6 months kind of periods, and that's not a way to innovate. So the goal is to get this down to a matter of weeks. And the way to get there is to build much larger clusters, up to 100,000s of, we call them XPUs accelerated chips. And the other truth is traditional network designs are not optimized for AI, but being able to scale these AI clusters to maximum performance is very important given their cost. Now, Ethernet, of course, is the universal standard in networking. The packet format hasn't changed since 1982. And the physical layers have evolved. Right now, we are just entering the 800-gig area, shipping this year in volume, and 2 years from now, 600 gigs, so that's all under control. And it's clearly the only network that exists that has proven scalability to hundreds of thousands of servers in the cloud today. And this was done, as we previously mentioned, with ECMP load balancing, leaf-spine architecture, which we helped to pioneer, and this is how the world's largest cloud data centers work today. However, ECMP by itself is not sufficient for AI. And the reason is that the traffic is different. In a normal, traditional -- I hate to call it a legacy, but the CPU networks of today, you have random traffic patterns, short-lived flows. If there's a little congestion, there's a minor impact, 100-gig connectivity per CPU, it can be oversubscribed, maybe in 10 to 1, so big data centers like a petabit per second. In the AI world, none of this applies. You have all-to-all, all-reduced traffic patterns, they are very long-lived flows. Any impact of congestion is very significant. Today it's 400-gig, but moving up to 800-gig very shortly, the network got to be non-blocking, highest throughput, and by the time you add this all up, you're talking hundreds of petabits per second per large hyperscale network. And again, the traffic is hard. You have these, they call them elephant flows, because they're so long, they don't stop many megabytes of data, transmitting with RDMA, which is very sensitive to packet loss. You have incast congestions, basically everything is sensitive to unfairness and problems within the network. This is a data site published by [ NIDA ], I think it was actually 2 years ago, that simply talked about how much time a large AI cluster spends on waiting for the network. Would you believe 30%, 35%, up to 50% just waiting time? Clearly this is not good if you just spend billions of dollars on the latest AI chips. So, the goal in life here is to reduce -- to maximize the throughput, to minimize what's known as job completion time, JCT, and this involves eliminating incast congestion, that's the moment things congest, other things don't get through. You want to optimize the load balancing globally across the whole fabric, get to 100% utilization, and also importantly, you want to allocate the bandwidth fairly to all these thousands of different flows that go through the traffic at the same time. And this is much harder than it appears because congestion is the natural result of multiple sources talking to the same destination. Yes, you can, in the network, route around this, but you cannot get around the fact there's 1 output port that is only at a certain speed, and at the point, traffic will back up. Now, to illustrate this, I have a picture of a car analogy trying to enter a highway here. I think this was some German auto van, but it's the same in New York, in a traffic jam. Basically, if you have multiple streets coming together, it will only get through at that rate of this egress point, and there's nothing anybody can do about this. And the real route, you can't drop the cars either. And so, now what you want is this picture, where the traffic moves smoothly, there's matching input and egress capacity, and everything is scheduled in advance that they never collide, right? So this is another illustration of this. Basically, if you have perfect flow load balancing, all these flows traveling in your network in parallel, and remember, it's not 1000s, 10,000s, 100,000 flows, will complete at the same time, and they all have to complete to get to the next step of the calculation. If some of the flows receive less bandwidth than others, they will still complete, but it will take longer, and thus the whole calculation slows down. So this is a traditional multistage clause fabric. There was a gentleman called Klaus in the 1950s that invented these multistage networks for telephony at the time. And yes, in theory, it's non-blocking. If all the flows are routed perfectly, and there's no incast congestion, no transient over subscription, it is non-blocking. However, in the real route, it looks more like this. Let's see, all the people on the left suddenly want to talk to the guy on the lower right. The traffic will back up, either because you've linked level flow control, which blocks that link from further traffic. Yes, you can route around it, but at the end point, you still can't get out. So, as the network fills up with bursts of packets, it prohibits or impacts other flows to get through, and that is resulting in less throughput. So I'll spare you the details here, but there is 1 architecture that works much better, known as virtual output queuing, which is basically a congestion-free fabric, and with VOQ, there's no incast congestion, there's no fabric congestion, it just works. And this is, again, technology from Broadcom, known as the Jericho3 generation now, with the Ramon3 chip, that essentially performs perfect load balancing, equal spring traffic of all the links, in 100% realization from the receiver schedules the traffic. So the receiver says, I'm ready for my next packet, and there's a fierce scheduler, no, look, it's the flow, this is beautiful. Better than that, we've been shipping this architecture for years in the form of the 7800 and the predecessor 7500 modular chassis. So, the best performing AI chip on the market -- sorry, switch on the market, is right here, in the form of the large modular chassis, which in the latest instance, will support 576 ports of 800-gig, or 1150 to 400-gig in single chassis, double the throughput of the previous generation, and can be combined, of course, in not just the single hop, but the 2-hop network, with small locality or ratings. Now, the new thing we want to talk about today is, if you take the chassis apart, and separate the fabric from the front end, you can scale this much, much larger. The architecture limit is 32,000 GPUs, or XPUs, the initial implementation is 4,800 gig or 8,400 gig, but the result is the same as the single chassis, 100% efficient, fully load balanced fabric, there's no incast congestion, no drop packets, it just works. And this, to me, is the best way to building large scalable AI fabrics. Now, separately, we also have the 7060 Tomahawk 5 based, 5200 switch, which is in production actually, and this is very popular for smaller clusters, or for multi-stage network between the 7060 and the larger fabrics. This is our most power efficient switch chip to-date, with RDMB load balancing does a very good job at load balancing traffic. Now, if you then compare the scalability for single hop and 2-hop scenarios, and this is all measured in units of 800-gig, with the 7060, you can actually build a cluster with 2,000 nodes, with just 2 hops, it looks perfectly fine. But if you want to go bigger, you really want to use the 7800 type of thing, at least for the spine. If you do a leaf-spine 7800, you can actually go up to 160,000 ports, which is 130 petabits, and if you use the 7700, you can actually do a million ports in 2 hops. So we call this the Etherlink platform, or architecture, to allow the most efficient large clusters. And I'm sorry, here's some pictures about this. So again, in many cases, there will be 2 hops, but again, you get tremendous scalability, 10,000s of GPUs in these 2 hops. So the bottom line here is Etherlink is our name for the optimized AI network infrastructure that combines VOQ architecture with receiver-based scheduling, perfect load balancing, single hop or 2 hop lookups, supporting scalability to well over 100,000s of accelerators, and combined with the software, which Ken will be talking about in a second tier, which provides continuous fine grain end-to-end monitoring and visibility. And with that, I will hand it over to Ken. Thank you very much.
Unknown Executive
executiveWell, needless to say, the vision of Arista is realized in reality, otherwise, nice vision. So what does it mean in practical terms for the network? We'll now hear from Arista Co-Founder and lead software engineer, Ken Duda, on the intertwined relationship of AI for networking and networking for AI.
Kenneth Duda
executiveAll right, everybody. Thank you so much for being here. I'd like to talk to you about Arista EOS software and AI, many of my favorite topics right there. First, a brief history of where we've come from. This is our 10-year IPO celebration. We've gone through -- actually more than that, it's now 15 years of shipping products. And over those 15 years, we've seen, every 5 years, a really great technology inflection that corresponds to a really great business opportunity, which has been tremendous for our success as well as the success of our customers. In 2009, it was high-frequency trading. Many of you in the room are well aware of this. We had the lowest latency switch in the business and did great in that market, got to profitability. In 2014, cloud really took over for us and building these data center networks at massive scale, exercising the scale of EOS and delivering a level of IP networking, that business really needed. Fantastic for us. 2019, the wide area routing became of paramount importance, and the EOS routing stack rose to that challenge. This is cloud routing around the world, across the data centers, out to the Internet and peering at massive scale. And of course, you already heard and will hear plenty more about AI as the latest technology transition that's -- it creates a huge opportunity for us. And all of these have corresponded, interestingly, to speed transitions in Ethernet, which continues to march along at a dizzying pace from 10 gigabits to 800 gigabits just in the time we've been shipping product. So it's been a great journey for us so far. I'm looking forward 10 more years of this ahead. And one thing that's stayed constant through this whole journey has been the EOS stack, the vision of the switch software done right. Switch hardware running the same operating system across all of these use cases, everything from the little campus switch with 24 ports of POE, up to the 576-port 800-gig Godbox that Andy was showing off, all running the same code. And then that feeding data up into our network-wide data lake, 1 data lake, with state about the entire network across the whole estate. No other vendor has anything comparable to this, with CloudVision and other applications running on top. But I actually wanted to talk a little bit more about NetDL, because NetDL, the data lake, is so foundational to how we can add value beyond selling individual boxes and instead deliver for our customers' fabrics, enterprise-wide solutions to their networking and security problems. It's through having the data in NetDL. NetDL is a data architecture based on open-source scale-out technologies. It's multimodal, meaning we store different types of data in NetDL. In eightspace, we store hierarchical data, which are system databases and our switches, routing tables, configuration, ACLs, VLANs, all that sort of stuff. In Elasticsearch, it's all the free text stuff, syslogs mainly. And in ClickHouse, the columnar data, the analytics, flow data, statistics, counters, all that sort of thing. And we can put these all together, create a uniform API, uniform access authentication authorization and a unified set of APIs on top of this, which is what we've done to build. We gather this data from all across the network. This is a multi-domain data lake, meaning it's taking data from all types of devices, IT, IoT, BYOD, servers, clients, everything, from every domain of the enterprise, the campus, the data center, the wide area network, and this data, not just networking details, but also about users, about applications, about services, bringing in information not just from switches, but also from third-party sources of information about the network, which allows us to contextualize that information to understand what users are using the network, causing these flows to exist, what applications are those flows a part of. And that's key to how we're able to build applications on top, including AVA, AI-driven, that makes sense out of what's happening in the network, that look for threats, that give you the quality of experience evaluation, that help the customer understand what their network as a whole is doing. It's getting all of the state into one place, the NetDL architecture that enables all of this, and then also third-party applications, third-party integrations, then run on top of that as well. So the NetDL piece is key to how we're able to deliver network-wide solutions that will be very difficult for our competitors to match. But of course, you need to have the software on the switches as well, and this is where EOS has been so key for us, because EOS has all the features that you need. And this is true for both in an AI context, but also in a network-wide context. Many of these features end up being important for AI use cases as well. But it's not just the features. It's also the best quality experience in the industry. And this is extremely important to us and our customers that we continue to deliver the best quality experience in my commitment to my customers and to all of you, is that we will always prioritize quality above shipping new features, shipping new platforms, meeting delivery schedules, because when the network ain't working, ain't nothing working. It's too important to play those games and take risks, and that is our commitment to our whole ecosystem. EOS for AI benefits in particular from some EOS features I'd like to highlight. Smart system upgrade. This is the ability to simply upgrade the software on a live system. You might say, well, that's nice, but how often do you have to do this? So what if your AI cluster has to go down every now and then? Well, with a small cluster, it may not be a big deal. But when you're building an AI cluster at scale, the cluster is being continually chopped up and sort of reallocated to different workloads, to different jobs. These jobs take hours to run, and if you disrupt them, they typically have to be restarted. It's very disruptive to the consumers of the AI workloads to have any kind of disruption in that fabric. And so, smart system upgrade actually ends up being pretty important. PFC Watchdog. This is a feature we actually implemented originally for storage clusters. It's great for AI clusters as well. With priority flow control and Ethernet-networks, you've got to be a little careful, because 1 malfunctioning device can back up traffic. It's like if 1 freeway off-ramp jams up, and people start just backing up, and it spreads down the freeway, and pretty soon the whole thing grinds to a halt. PFC Watchdog detects this and shuts down the offending system, and is really important to the reliability of these fabrics. And of course, best software quality, as I already mentioned, is important for AI as well. But we've also been doing some work in EOS on very AI-specific focused features. And this is on the Tomahawk 5 platform. We've done some work in Etherlink with automatic load balancing across the fabric. We have various participants in our industry making claims about what Ethernet can and cannot do. And I don't think they're necessarily thinking completely about all the things that Ethernet can do. So what this picture shows here, in yellow, is what you get with traditional Ethernet flow balancing. This is where each switch hashes a flow header and uses a hash function to distribute flows across uplinks. And the problem with this, as Andy was talking about, is that when these 2 flows hit the same link, the whole thing slows down. Not just those 2 flows, but the entire job marches to the beat of the slowest drummer. And so, you get this sort of flattening out of performance once the network reaches a scope where you're getting those kinds of flow collisions in your network. Now what you see in the blue graph there is what you get when there is no flow hashing, when the scale is small enough that all the flows can go through a single switch. What you see in the red is a multi-tier scale network with the flow balancing turned on. What you see is identical performance to blue, which means that this sort of fabric load balancing problem is essentially solved by our Ethernet technologies here. So we're pretty excited that we can -- the Ethernet doesn't stand still, and we can solve the practical problems using a technology that's so widely deployed. Because we know at the end of the day Ethernet will win this fight because of simply Metcalfe's law. The value of a network is quadratic in the number of nodes. The size of the installed base of Ethernet is overwhelming. So this will be the future technology for AI fabrics as well. And we're able to do this at tremendous scale with EOS. Here's a design with a 7060 Tomahawk 5-based leaf and a 7800 Jericho3 spine. This easily scales to 64K GPUs, 400-gig links per GPU, full line rate, non-blocking bandwidth across this entire thing. And we can build even larger networks as well, as Andy mentioned. And finally, EOS and CloudVision together provide AI fabric visibility -- fabric-wide visibility. And this is something that competitor switches don't do a very good job of honestly. And what we're able to do with EOS is gather data from all of the leaf switches about how the network is performing. We have LANs for detecting if there's any latency through the switch. We have a queue-depth monitor to figure out if any congestion is building up within the switches. We have ECN mark counters. ECN is a TCP feature where if some other node in the network has marked the packet as having experienced congestion, we're counting that. We have RDMA counters. The switches are aware of the AI protocols and counting the different types of protocol messages so the operator can see what the overall network load is of their AI training systems. And we've got an EOS agent running on the host inside the NIC, providing visibility to CloudVision of what's happening inside the host, inside the driver, on the host bus, understanding if there are queuing and delay issues there as well. So we're able to put the whole picture together in CloudVision where we're capturing this data at a granularity that's 100 microseconds per sample, capturing 600,000 samples per second. It's the kind of fine-grained visibility that you need to understand the kind of microburst problems, very short-lived queuing and congestion problems that can be the enemy of good performance in these fabrics. So that's what I've got to say about EOS for AI. Now how is AI helping Arista help our customers do better? This is AI for networking. And our strategy here is to do what works. We have competitors that I believe are overselling what today's generation of AI technologies can actually enable within the context of the network. So what we've been doing for actually several years now, predicting transceiver and power supply failure with simple machine learning models, predicting table exhaustion. As your network grows, at what point in time is your routing table going to fill up? Is your IGMP snooping table going to fill up? Predicting link saturation. As traffic is growing in your network, when are you going to have bottlenecks? Estimating the application quality of experience from TCP data, gathering network-level data, and telling the customer what we expect their customers are experiencing in network application performance based on what we're seeing at the network layer. Identifying related events. When more than one thing is happening in the network, how are those things related and how do we gather them up to identify an incident of related events that are happening together? And of course, network detection and response threat hunting is heavily AI-driven. And in addition, recently we've added a couple of things. Ask AVA for natural language queries to be able to ask questions about what's happening in your network without having to know the details of every last page of the CloudVision GUI and getting summarized answers. And then also, just internally within Arista, training LLMs on our history of support cases to enable our support staff, as that scales, to quickly locate similar cases in the hopes that they can find patterns and more quickly get to the root cause of customer issues. So, AI is important for us and our customers as well. And I'd now like to show you a quick video which makes some of the points of how EOS serves in an AI network. Thank you very much. [Presentation]
Kenneth Duda
executiveOkay. So that's vision, how it's being realized. But for Arista, beyond any other tech company, it's about the partnering on the way to executing and delivering for the customer. Partnering is critical in this. And as Andy talked about in conjunction with the 7800R4 switch, Arista has been working very closely with Broadcom on making this a reality, and there's much more growth from here. And here to talk a little bit about what they're doing together, I'm delighted to welcome to the stage, Hock Tan, CEO of Broadcom.
Hock Tan
executiveWell, thank you very much, Jayshree, the Arista team, for inviting me to speak today to celebrate this milestone. On behalf of Broadcom, congratulations on the 10th anniversary of your IPO. Well deserved, and it's an exciting marker in the company's history, and I'm pleased to be here. More than that, I have to be here, because you know why? Arista and Broadcom have been partners, real partners, in innovating Ethernet solutions for the past 10 years on open and interoperable Ethernet solutions. And it's a great partnership because it takes the best of breed silicon from Broadcom with the best network operating system in the world -- in the industry. And together, we've created a platform that is the industry's highest performance product around. It's not just about Arista and Broadcom getting it there, it also validates very much the business model I've always believed in, which is both of us focusing our respective core competences in creating this platform. And I like to make it certain, because at the end of the day, I believe philosophically that just because you do the best CPUs do not entitle you to make the best networking, and the opposite is the case, and that frankly validates this very, very clearly. So moving on -- we have combined been very successful in capturing and navigating market transitions. You see it up here. 10 years ago, over 10 years ago, in fact, around 10 years ago, the hyperscalers, cloud data centers were built with large monolithic systems. They do not scale. We partnered with Arista, I should say, Arista partnered with us very well with our Tomahawk switches to deliver cloud scale networking. Today, that is a solution most prevalent. Five years ago, the same similar opportunity showed up in routing, okay? Telco -- service providers were buying very complex, very large routing symptoms, which could be much better produced, designed, compact and with higher power efficiency. And that's what's happening more and more today. Then more recently, less than 3 years ago, even before ChatGPT showed up and was introduced, we started working together for -- to produce a back-end network for AI. You heard Ken talk about EOS for AI. Well, we also did together with that Jericho AI and the end result was the Arista 7800 switch, which is the best product as Andy said, out there for AI back-end network today. And so let me talk to you a bit about what -- what's so unique about this AI fabric. Okay. Essentially, what it is, is AI is a distributed computing problem. It is a problem because no matter how large any GPU could be, the way the large language models are exponentially progressing. You need many of those GPUs to point of thousands, tens of thousands and beyond to be clustered together to run a single job -- workload process in parallel. And each work job require this parallel processing workloads to synchronize to coordinate. And the only way you can do that is through a high-performance network. And in a nutshell, the network has become the computer and the performance of the network has a direct impact on the performance of your large language model training. So I'd like to add -- the network of choice today is Ethernet. Seven of the largest AI deployments, classes that which are deployed in the world are connected through Ethernet, shown up there. And this includes both front and the back-end. And there are very many reasons why this is the case. Let me just give you 2. For one Ethernet is available. It's open, interoperable and a strong ecosystem. Secondly, and you've heard that from preceding speakers, it's just built for scale. Let me give you a practical example of why Ethernet outperforms InfiniBand. There is true or any other protocol for that matter. When you're building these large clusters and when people talk about InfiniBand, they talk a lot about lossless network. Reality, there's -- no network is losses because at this high bandwidth you can use optics. Just think about 4000 GPU cluster you had put in place 10,000 optics Interconnect. Optics, you may know, fail on average 2% to 5% each year. That's a minimum 15 failures per month. Even on a cluster with 44,000, multiply 10x to 40,000, which is what these days, more and more of this AI classes are, you really have a network -- cluster, an AI data center that will be not performing very well, unless you have a protocol, Ethernet, that can re-converge quickly. Ethernet does that. It does it 30x better than InfiniBand. So the clusters then just keeps growing bigger and bigger, as they're shown here. And here the proof points of Ethernet at scale already deployed so far. And by the end of this calendar year, you would expect more than one of these hyperscalers -- one of these enterprises to be a deployment with more than 100,000 GPU in a cluster. And you often might want to ask -- why are we heading up even more and more? Well, I have a couple of observations, which I'd like to share. You heard it. These large language models continue to grow. Why? As a matter of fact, with 30,000 clusters GPU as a matter of fact, you can train today the entire data on the Internet, just 30,000. So why do you need to keep going to 100,000, much less as I put the [ market ] of 1 million. When you're at 30,000 and you train the entire Internet database think what it does to your corporate database? You got it. You have all the bases to create productivity improvements in running corporate organizations in getting AI to help their thing. So why is that -- why is that continuing to grow? And I'd like to say -- suggest certain thoughts because it does suggest what is the potential -- many of you are investors, so you're interested in it? What is the potential? What is the TAM, Total Available Market that we are seeing or I am suggesting in this space. And it comes down to this, okay? If we have enough AI generating power, why we keep going? And the reason we keep going is because AI today, under what we call the algorithm that runs it. It's all about regressing data, the huge amount of data you have and give your point solutions. The next step is you get an algorithm, large language models that next progression that can reason like us that can look at infinite choices and pick the best like us. We are trying to create -- recreate a human brain in a machine. And I have to say the technology exists today and so why stop at 100,000 -- at 30,000, you can train and entire database in -- on the Internet, what would a 1 million do basically convergence to AGI. And that's pretty much where this whole thing seems to be headed. So in that case, ask ourselves how big is -- how many of these people are trying and how many parties are there that are trying to do this million GPU clusters? I would say probably 5. Do the math, each of this million data center would cost $40 billion, $50 billion on average. $5 billion of that is networking. Over the next 4, 5 years, that's $25 billion. Take another 50 digital natives at maybe a scale of 100,000. That's another $25 billion at the same ratio of networking, $50 billion over the next 4 to 5 years, $50 billion. And that's the TAM just on large language models and the buildup of AI data centers. Now go one step further. All these great intelligence when we achieve it, it's going to be consumed, right? Otherwise, why do it? And when they get consumed, what happens? Well, we have to absorb the -- to deliver these services, we have to look at upgrading infrastructure. Routing service providers have to upgrade their routing networks and the networks campus to the enterprises, broadband to the homes, AI at the edge. I believe that's another kicker over the next 5 years of $50 billion. So we have ourselves an opportunity of $100 billion incremental available market in networking, just in networking. And that's the interesting opportunity I see today -- sorry, going too fast, that Arista combined with Broadcom will have an opportunity to monetize and capitalize. And with that, I just leave that for you to think about. And again, I want to thank Jayshree and the team. They and their engineering teams have been very, very good people to work with. And again, congratulations. Thank you.
Jayshree Ullal
executiveSo you just increased my TAM, Hock. I said $70 billion. You said a whole lot more.
Hock Tan
executiveYou don't think I did something to your enterprise value?
Jayshree Ullal
executiveThank you for setting higher expectations. But I just want to echo what a great friend and partner you've been. And part of being a partner is there's a lot of give and take, right? Today, I want to give you something. I'll take some more chips, Hock, if you'll give them to me. And I just want to, on behalf of Arista, present to you the Partner of the Decade Award. You have been a partner and we have been together been strong proponents of merchant silicon. And even though we've always looked at other merchant silicon, it always comes right back to Broadcom. So congratulations, and thank you so much. One more thing before you try to leave. Hock actually gave up his Board meeting to be here. So thank you for prioritizing your partner over the Board members. I appreciate that. I hope I won't get you...
Hock Tan
executiveI wouldn't do it any other way.
Jayshree Ullal
executiveHarry, I hope that was okay, yes, okay. People have influenced our lives and you just heard of an example of Hock and how much he's meant to the best-of-breed silicon and the best-of-breed software. Early in Arista's career back in 2008, '09, '10 in our struggling years, you always look -- it's a very lonely spot to be as a CEO. And Andy and I always look for mentors. And it gives me great pleasure to announce Joe Tucci, the Former CEO of EMC, to come up on stage and say a few words as my mentor and friend of Arista for all these years. Joe?
Joseph M. Tucci
executiveThis is unexpected. Well, glad to hear it. Actually, Arista goes back 20 years. Andy found Arista with 2 partners in October of 2004. And it's called, I think, Arastra -- but Andy showed a great sense in 2008, brought in one of the great leaders of all time, Jayshree who joined the company, and that changed the total course. So I would like to join everybody else who spoke today and congratulate the entire Arista team on the last 10 years as a public company, just tremendous market share, gained tremendous notoriety for the company and congratulations. And I know you don't do that without great leadership. So you are indeed fortunate to have Jayshree and Andy at the forefront. They're amazing people. Good founders -- they're smart. They're smart technically. They're smart, just math-wise, they're smart. English-wise, they're just smart people. But that's not enough. You got to have -- in my day, call it audacious goal, you got to be able to build a goal, that's bigger than yourself, stretches the whole company, and you did build that audacious goal and they not only built it, they achieved it. So to me, great leaders are not only smart but they're also great human beings. People you want to be around and I've been very fortunate to be a little bit around Andy and Jayshree -- nothing to do with their success, but they are tremendous people, and I'm proud to call them friends. And as Frank Sinatra said, back about 50 years ago, the best is yet to come for Arista. I'm sure. Thank you. Sorry, I'm shaking a little bit, but I got -- I just -- if you're over 70, if you just ask my advice, do not do a back operation. Don't do it.
Jayshree Ullal
executiveYou have not only influenced Arista, but today, I'd like to present you the Lifetime Achievement Award for all you've done during your time in Storage with VMware and of course, with Arista, congratulations. We're so honored to have you. Give it up.
Kenneth Duda
executiveThis close commitment you've seen, the mentoring you've seen has borne so many more -- so many results with many more to come. I mentioned earlier, Jayshree's blog, and you've seen much of the work they've done already with much more to come. And I'd like to now -- it's demo time. I'd like to welcome [ Simon Capper ] on to the stage to show you some of the new work in the AI and NICs. Simon?
Unknown Executive
executiveSee the investors, the investment flow. So I'm going to talk about AI centers today and give you a quick demo. And I'm representing the work that the whole software teams put into this over the last many months. I know they're watching. So give them a quick shout out. The heart of the AI Center is EOS. Ken has already spoken about that today. We are built on top of that. EOS provides a rich set of programming interfaces, and you can see some of them on the screen here, which our customers use to orchestrate and manage their networks. With the addition of AI centers, customers are now able to leverage EOS' configuration widgets, including open standard ones like open FIG to provide a homogeneous point of control for all of the network and hardware in their network. They can do this by integrating the new AI NIC and server interfaces that we've added to EOS and integrate this with their own tools or alternatively use Arista's cloud division, which other presenters have talked about. AI centers relies on the Arista AI agent or AI NIC agent in this picture here. It runs directly on the NIC itself, in the NIC's own CPU core, or alternatively inside the host itself. And it will connect to the adjacent Arista switch to continuously send telemetry and receive configuration updates to keep it in the correct state. The engineering design itself is NIC, server and vendor agnostic. And we have so far today integrated or are working to integrate with several vendors, including AMD, Broadcom, Intel and NVIDIA NICs. So let's look at a quick demo. This is from a demo switch in our labs. And this is using our CLI interface, probably our first interface we ever built, but it does show you what's going on. So the operator here is going off the switch to find. [Presentation]
Unknown Executive
executiveApologies for that. So back to the CLI. So in this case, the operator has offed the switch. Can you see any NICs in the network that are enabled with the AI agent? And in this case it shows 3. The operator can then enable the AI agent. So initially, it will come up as unconfigured and then the operator will go ahead. And yes, I can get the slides to work -- the operator will go ahead and -- well, you'll have to believe me because that slide is not functioning for me. I guess we have some AV issues today. I apologize for that. The operator can actually then log in and change the configuration on the server and on the switch to actually enable the AI NIC and the server to communicate with EOS. And then any configuration that has been configured into EOS like PFT settings, QOS settings, anything that's important for the AI traffic to flow properly between the NIC and the network, those settings will be set up sites that you do not have any configuration errors. We have had customers where they take several weeks to get their network installed and running, and this effectively eliminates those problems. Now I mentioned before that AI center can be integrated with customer orchestration systems, and that's important for some of our big customers that developed many years' worth of software to run their networks. But we have also extended CloudVision to be part of the AI Centers offering. CloudVision collects the data from the NICs via the switches and the AI agents in the network and provide the level of observability that's not available on an individual switch or a host. In this example, you can see 5 pairs of NICs at the bottom there. Two of them or one pair are un-configured. What's happening here is this would be maybe a new install and the operator will go in, configure these NICs, set up various parameters that need to be configured to ensure correct network operation, and that's done. It's finished. So in this example, we've now got all the new NICs, in this case, too, because more -- it doesn't fit on the screen, have now been set up and are now fully configured and are ready to run AI jobs. Another advantage of having CloudVision is that we can see and observe many NIC metrics and overlay those metrics and correlate other events in the network. In this case, you take the example of a network operator has been alerted by the compute team of some sort of AI performance issue. In this case, the operator is able to see on a particular NIC that this had RDMA errors, which is an indication that the AI job had an issue at the time that the compute team reported the error. This allows the operator -- network operator identify which NIC, which server triggered the job problems, and it can then schedule a maintenance or service on the server itself or the older NIC. And they can then, of course, report back to the compute team, hey, I figured out what went wrong. This is what went happening in the network. Now in addition to being at a spot or give the operator the ability to see and manage data in the network, that vision can also correlate seemingly unrelated events. In this particular case, we have an RDMA Packet sequence error on the NIC. CloudVision has determined that at the same time that error occurred, we had a transmit drop or discard error on a switch, along with acute congestion error. So CloudVision has correctly identified here that even though the operator or the compute team may have seen an RDMA Packet error, CloudVision has looked at the other telemetries in the network and identified that this was due to congestion in a switch. This obviously helps the network operator focus on debugging the real issue rather than wasting time having to do it all this themselves. This is a very important part of CloudVision. And CloudVision has a store of all of these events in this intelligent data lake. So it can go back in time and root cause these problems after they've occurred. Additionally, CloudVision can be configured to generate automated alerts when it detects an error in the network. These alerts can be integrated into existing tools like Slack or e-mail or Hangouts or whatever the messaging system the company uses. This particular example is on Slack. And here we can see that, again, we've got this RDMA error on a particular Ethernet interface. These alerts can actually trigger automated maintenance. If you had a situation where a link was failing due to a cabling problem, you could schedule cable replacement. This reduces the turnaround time required to repair equipment failures and reduces the amount of time that the highly skilled operators have to spend debugging, well-understood types of equipment failures. That concludes my demo and welcome to the new AI centers. Thank you.
Kenneth Duda
executiveWatching that something powerful occurred to me. I've been around tech for over 30 years now. When I came, Andy Bechtolsheim was envisioning the day there would be Gigabit Ethernet. And in that time, we've seen the Internet, mobility, cloud, cloud WAN, high-frequency trading and today, 800 gigabit. And one thing has not changed. The demo gods are pitiless, every time. Anyway, that's fantastic. You will see more of it in the future. We've only started. I'd like to bring up Chantelle for some closing words for the home audience. Chantelle?
Operator
operatorPlease welcome to the stage Chantelle Breithaupt.
Chantelle Breithaupt
executiveNow, you can hear me? Great. So you know what a great -- time together, some recognition, some thought leader presentations, but now we have 3 fun awards. You don't want to have a presentation where there are some fun awards at the end of it. So we have 3 of them. And Jayshree, I'll hand it over to you to do the first one.
Jayshree Ullal
executiveSo the first one is for Paul Silverstein, who wrote the very first note before the IPO and actually believed in us and thought enough to write a note on us when we were still a private company. Where are you, Paul? Come on -- come on down for your special award here. Paul, as you know, is prolific and is writing and he got us going. He inspired us to do more work as he wrote about us. Congratulations. Thanks for coming down. I know you have a full-time job -- that doesn't make it easy. Thank you. Well, the next award is I woke up on June 6 to see the first ANET symbol and report from none other than Brian Marshall. Where are you Brian? Come on down. What was our market cap then? $3 billion. Okay. We've come a little way to that. Thank you so much and congratulations.
Chantelle Breithaupt
executiveAll right. And I get to introduce the last award. So as you guys can probably guess, I read a lot of reports. We read a lot of reports. And every once in a while, there is a title that makes you laugh or cry. But most likely, it gets to 1980s and 1990s or an early 2000 song stuck in your head for the rest of the day. So for the best one-liner titles, Clever Title Award, [ Mr. Jim Fish ]. [Technical Difficulty] So thank you so much for that. So I'm just going to officially end the webcast portion of this program. We have some things post webcast. So thank you to everyone that joined online. Hopefully, you have seen our celebration and seen enough to submit your conviction for the next 10 years for Arista going forward. So thank you so much for joining. And now -- so [ Quentin ], are you coming back up for the remainder of the activities?
This call discussed
For developers and AI pipelines
Programmatic access to Arista Networks, Inc. earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.