Advanced Micro Devices, Inc. (AMD) Earnings Call Transcript & Summary
November 8, 2021
Earnings Call Speaker Segments
Lisa Su
executiveWelcome, everyone, and thank you all for joining us. Today is all about the data center, and I'm looking forward to showing you the next generation of AMD products that will extend our technology leadership over the coming years. We're in a high-performance computing mega cycle, driven by the growing need to deploy additional compute performance delivered more efficiently and at ever larger scale to power the services and devices that define modern life. At AMD, we're focused on pushing the envelope in high-performance computing every day. We have made significant investments in multigenerational road maps to deliver leadership edge, networking and cloud solutions. And in the data center, workloads are diversifying and becoming even more compute-intensive. This requires specialized approaches to address these unique needs. We see the data center compute evolving into 4 distinct categories. General-purpose computing covers the broadest set of mainstream workloads, both on-prem and in the cloud. Socket level performance is an important consideration for these workloads. Technical computing includes some of the most demanding workloads in the data center. And here, per core performance matters the most for these workloads. Accelerated computing is focused on the forefront of human understanding, addressing scientific fields like climate change, materials research and genomics. Highly parallel and massive computational capability is really the key. And with cloud-native computing, maximum core and thread density are needed to support hyperscale applications. To deliver leadership compute across all these workloads, we must take a tailored approach focused on innovations in hardware, software and system design. Today, we're going to talk about our next-generation data center CPUs and GPUs, specifically designed for these workloads. This includes new cores, new packaging and process technologies and new products. And we'll share with you how we bring it all together to power the exascale computing [ error ]. Now let's start with data center CPUs. We've made tremendous progress in the data center over the last 4 years. [ EPYC ] processors set a new trajectory for the industry in both performance and scalability, while delivering new levels of data security. And as a result, we're seeing growing customer preference for EPYC. In fact, today, we have shipped more than 200 million AMD EPYC cores, powering the daily computing experience for billions of people across the cloud, enterprise and HPC. We've seen tremendous cloud adoption and have built a deep partnership with many of the largest cloud companies in the world, including AWS, Azure, Google Cloud, Tencent and Oracle Cloud. Earlier this year, we introduced third-gen EPYC with leadership performance, core density and power efficiency, enabling the largest cloud companies to deploy at scale with the best total cost of ownership. Today, I'm pleased to announce that Facebook now called Meta, is the newest cloud partner adopting AMD EPYC processors. We've been working together to jointly define an open cloud scale single-socket server, designed for world-class performance and power efficiency. We optimized third-gen EPYC for their newest [indiscernible] systems to deliver leadership performance per watt across major workloads. We'll announce more details together at the Open Compute Summit later this week. I want to thank Facebook for the partnership and strong engineering collaboration. We're extremely excited to work with them to support their future data center expansions. This means EPYC is now designed into the data centers of 10 of the world's largest hyperscalers. Now the performance, features and efficiency of third-gen EPYC are also driving strong adoption in the enterprise market. Some of the best-known Fortune Global 500 companies have deployed EPYC in their data centers to run their most important workloads. SAP is one of the leading producers of software for managing business processes. Many of the largest companies in the world use solutions based on SAP S/4HANA. Today, we're very excited to announce a new step in our partnership with SAP, focused on EPYC powered infrastructure as part of the rise with SAP offering anchored by SAP S/4HANA Cloud. Together, we'll improve the TCO for our joint customers while also reducing the carbon footprint of the platform. And AMD expects to be one of the first adopters of S/4HANA cloud solutions hosted on our AMD EPYC powered infrastructure. I'm really thrilled to expand our partnership. While the momentum we've achieved with EPYC is great, we're always pushing the envelope on CPU performance and performance per watt. So let's get to our first major product innovation update today. Our investments in innovations in packaging have been a multiyear multi-technology journey. We introduced HBM and silicon interposer technology in our GPUs in 2015, and delivering industry-leading memory bandwidth in a small form factor. We set a new performance trajectory for compute in the data center and PC markets in 2017 with multichip modules. In 2019, we introduced chiplet technology, combining chips built using different process nodes in the same package, which really enabled significantly higher performance and capabilities. And at Computex earlier this year, I showed you the next big step for the industry. One, we developed in close collaboration with TSMC based on their 3D fabric technology, combining chiplets with die stacking to create a 3D chiplet architecture for high-performance computing products. The first demonstration of our 3D chiplet technology stacked [ cache ] memory directly on top of a Ryzen desktop prototype to deliver a significant increase in gaming performance. Today, I'm excited to announce that we're bringing 3D chiplet technology to the data center and our EPYC CPUs, adding a new 3D cache design to the leadership Milan product family. We're using an industry-first hybrid bonds [indiscernible] approach that provides over 200x the interconnect density of 2D chiplets and more than 15x the density compared to existing 3D stacking solutions. This enables a much more efficient and denser integration of our IP. The [ die-to-die interface ] is using a direct copper-to-copper bond with no [ solder bumps ] of any kind. This approach improves thermals, transistor density and interconnect pitch over other 3D approaches and is the most flexible active-on-active silicon stacking technology in the world, consuming less than 1/3 of the energy per signal of micro bump approaches. Our first server CPU with 3D chiplet technology, our code named Milan-X. These processors have 3x the L3 cache compared to standard Milan processors. At the top of the stack, that adds up to 804 megabytes of total cache per socket. This additional L3 cache relieves memory bandwidth pressure and reduces latency and that, in turn, speeds up application performance dramatically. Milan-X is built on the same Zen 3 cores as our general purpose Milan processors with up to 64 total cores. It's the fastest server processor for technical computing workloads with more than a 50% uplift compared to Milan processors, which are already the fastest in the market today. And they're fully compatible with third-gen EPYC platforms. With the simple bios upgrade, our customers can drop Milan-X into existing platforms. This accelerates customer qualification and enables faster deployments. And these CPUs also take advantage of software as is with no changes required. Now let me show it to you for the first time. This is the third-gen AMD EPYC processor with 3D V-Cache, Milan-X. We have removed the lid from this package so that you can see the 6-millimeter by 6-millimeter SRAMs hybrid bonded to each of the 8 Zen 3 CCDs. To tell us more about the breakthrough per core performance that 3D V-Cache technology brings to EPYC, let me welcome, Dan McNamara.
Daniel McNamara
executiveThank you, Lisa. The market traction with EPYC continues to accelerate, and we are pleased with our customer and partner adoption since our Milan launch. The next step in our journey is to deliver more differentiation and value with a focus on performance per core. So we are really excited about bringing 3D V-Cache to market with Milan-X. As a design target for Milan-X, we zeroed in on technical computing applications. These are some of the most complex and demanding workloads in the data center. These applications are typically enablers of product design, finite element analysis and structural analysis tools are used to simulate and improve the design of physical systems. Computational fluid dynamics is used to simulate physical interactions across a broad range of applications, from consumer product designs to aerospace engineering. Just as these software solutions are used to simulate the physical world around us, EDA tools are used to simulate and optimize chip design. While we're architecting Milan-X, we look deeply into how these applications behave and found that a large cache was critical to attaining better performance. More L3 cache ensures that critical data is closer to the cores and that reduces latency in the system. So we saw a great opportunity to apply our innovative AMD 3D V-Cache to these applications and deliver a new level of performance to our customers. But before I show you what Milan-X can do, let me first refresh on the Milan processors currently in the market. Today's Milan processors delivered clear performance leadership across a wide range of technical computing workloads. Here are benchmark results comparing our 32-core EPYC 75F3 versus the [ 32 core Xeon 8362 ] on key technical computing workloads. As you can see, Milan delivers distinct advantages compared to third-gen Xeon scalable processors with anywhere from 33% to 40% uplift in performance. With this as a backdrop, I will show you how we are extending our leadership even further with Milan-X. Let's take a look at EDA. Chip design is an iconic technical computing workload. It is highly compute-intensive and complex. One of the most important tasks in SoC design is verification. Verification proves that each structure in the design does what it's supposed to do. And it also catches defects early in the development process before chip is committed to [ silicon ]. Today, we are showing a demo of the Synopsys VCS tool. VCS is the primary verification solution used by many of the world's top semiconductor companies. On the left side, you see our leading third-gen EPYC server CPU. And on the right side, you see our Milan-X CPU with AMD 3D V-Cache. Both are running Synopsys VCS. Each server is simulating an AMD RDNA 2 graphics core. VCS generates a model of this chip from the source code and then uses that model to simulate design by running various tests. You'll see individual tests for the design change color as each is completed. As you can see, Milan-X completes more tests in the hour, getting to full coverage in a shorter amount of time. These results show that the Milan-X-based verification completes 66% more jobs in Milan. If you consider the competitive analysis that I started with, you can see that this new solution will bring the next level of value and performance to our customers. users can finish their verification and get to market faster or add more tests to further improve the quality and robustness of their design. Either way, Milan-X delivers 66% more performance and that will translate directly to the efficiency and quality of product development. This step function in performance will be delivered out of the box with existing applications when Milan-X launches. These applications are developed by leading IPs for some of the simulations I just covered and many others. We have deep engineering engagements with key market-leading software vendors, Altair, ANSYS, Cadence, Siemens and Synopsys, just to name a few. They're all very excited about the capability and performance of Milan-X. We are working closely with them to bring the combined hardware and software solution to market. As we continue tuning and optimizing these applications, we expect even more benefit for our customers. Our partners will be ready with certified and highly performant applications running on day 1 at launch. While we see tremendous value across technical computing with Milan-X, we also see that a broader set of applications can benefit from a larger L3 cache. In today's data-driven economy, real-time decision-making is a must. In applications like data mining, risk analysis and anomaly detection, getting to Insights faster is extremely important. With Milan-X, more data can be kept closer to the processor driving faster outcomes. For media and entertainment, an industry that is transforming to deliver high fidelity in real time, or [ L3 cache ] will translate to more parallel live streams per server. And with AI, fitting more model ways and activations into the larger L3 cache can enable real-time inference. We are engaging today with ecosystem partners across these domains to develop turnkey solutions with increased performance and scalability. As you can see, we're excited about the impact Milan-X will drive across the technical computing landscape, delivering value in 3 important ways: increased design and productivity, higher-quality products and faster product design cycles leading to faster time to market. There is tremendous enthusiasm amongst our partners to bring Milan-X solutions to market. Now, let me hand it back to Lisa to talk more about our partner plans.
Lisa Su
executiveThank you, Dan. Now let's talk about Milan-X availability. One of our premier cloud partners, Microsoft Azure is first to take advantage of the benefits of Milan-X. To tell us more, here is Executive Vice President of Microsoft Azure, Jason Zander.
Jason Zander
attendeeThank you, Lisa. Microsoft and AMD share a vision for a new era of high-performance computing in the cloud, one defined by continuous improvements to the critical research and business workloads that matter most to our customers. We've partnered with AMD to make this vision a reality in Azure with our HP series of virtual machines, which offer up to 12x the performance of other clouds and rival some of the most powerful supercomputers in the world. It's a fantastic platform for our customers to solve their HPC challenges radically faster and with greater cost effectiveness. Today, we're excited to announce the latest enhancements to the Azure HPC platform. [ Milan-X processors ] are coming soon to third-generation Azure HP virtual machines. We're also announcing today a preview program for customers to get early access to Milan-X processors in Azure. We're most excited about how these performance gains will help our customers and partners do their work better as the significant improvements to memory latency and bandwidth with Milan-X are a big win. For example, ANSYS Cloud is an integrated suite of engineering simulation tools and services, all hosted on Azure. In the early testing of HP Series with Milan-X, ANSYS saw up to an 80% increase in the performance of their customers' aerospace simulations using Fluent. For other customer workloads such as automotive crash test modeling, we're seeing up to a 50% higher performance. And that doesn't even begin to tell the story of the manyfold increase customers can experience over most on-premises hardware in use today. Finally, we're extremely excited about the ability of Milan-X to advance the performance and total cost of RTL simulations in Azure. This is the key HPC workload for digital and mixed-signal silicon companies. Milan-X brings some of the largest performance enhancements to RTL simulation in the modern history of silicon design. It's a giant leap forward for Azure to becoming the best platform in the world for silicon design, both now and far into the future. Our ongoing partnership with AMD and the innovations that we've seen along the way, continue to move us forward and empower our customers to achieve more.
Lisa Su
executiveThank you, Jason. We're so appreciative of the partnership between AMD and Azure, and we're excited about the preview with Azure HPC powered by Milan-X. We're also working with the world's leading OEMs on Milan-X. Milan-X platforms will be broadly available from Cisco, Dell, HPE, Lenovo and Supermicro. And I'm excited to announce we're on track to launch Milan-X in the first quarter of 2022. Okay. So now let's turn our attention to accelerated computing in the data center. This is where the demand for compute power from scientists and researchers in order to analyze and make sense of incredible amounts of data at the highest of speeds has never been more important. And GPUs are the accelerator of choice on these ultra demanding workloads. We've been on a journey to build a leadership compute GPU architecture and road map. Last year, we introduced CDNA, our first GPU architecture optimized specifically for the data center. And with it, we delivered up to 11.5 teraflops of FP64 performance with our MI100 products. Today, we're introducing our first CDNA 2 architecture-based products. CDNA 2 was designed specifically to enable exascale computing. I'm very excited to introduce the AMD Instinct MI200 GPU built with CDNA 2 architecture. The MI200 Series delivers up to a 4.9x increase in HPC performance over the competition. It's just a massive step. With this leap and capability, MI200 will set new performance records across a broad set of HPC applications. MI200 delivers up to 1.2x higher peak flops of mixed precision performance for leadership AI training, helping to fuel the convergence of HPC and AI. And with MI200 and ROCm, the world's most powerful high-performance computing and AI platform, we're shortening the time between initial hypothesis and discovery. For example, drug interaction simulations that would take days to run can now provide researchers with results overnight. Now let me show you the top of the stack MI200 for the first time. It contains [ 2 CDNA 2 GPU dies ] for a total of 58 billion transistors in 6-nanometer. This allows for up to 220 compute units and [ 880 ] Matrix cores which is 1.8x more than MI100. It also contains up to 8 stacks of HBM2E memory, making it the world's first GPU available with 128 gigabytes of HBM2E. That's 4x more capacity and 2.7x more bandwidth than M100. And now to tell us more about the MI200 series and to see it in action here is Forrest Norrod.
Forrest Norrod
executiveThanks, Lisa. Today, we're announcing 2 members of the MI200 family. The MI200 OEM in production today is a compact module that enables some of the world's most powerful supercomputers and the MI200 PCIe card will be available soon for a broad set of platforms and customers beyond supercomputing. The MI200 is amazing. Let's look at the 3 pillars that make it unique. The first is our AMD CDNA 2 architecture, which is designed to do one thing extremely well, run compute-intensive HPC and AI workloads. The second is our innovative packaging technology that enables the MI200 to be the world's first multichip GPU. Finally, our third-gen Infinity architecture is delivering unified compute at exascale with high-speed lengths and CPU GPU memory coherence maximizing system throughput. Lifting the lid, you'll see the multi-die construction of the MI200, dual AMD CDNA 2 dies, 4 ultra-high-bandwidth low-latency interconnects between them, 8 stacks of HBM2E memory and another 8 Infinity Fabric links to connect to EPYC CPUs and other GPUs in the node. We put all of this together by continuing our packaging innovation. Today, we are introducing AMD EFB Elevated Fanout Bridge, a silicon bridge technology, Unlike substrate embedded silicon bridge to architectures, EFB enables use of standard substrates and assembly techniques, providing better precision, scalability and yields while maintaining high performance. With all of this, the MI200 OEM is shattering performance barriers and delivers a multigenerational leap in performance. The MI200 OEM is 4.9x faster than NVIDIA's A100 GPU in peak FP64 performance. This is critical for HPC workloads requiring the highest level of precision like weather forecast. The MI200s peak FP32 vector performance is about 2.5x faster. These are the types of math operations used for vaccine simulations. MI200 Matrix cores delivered 95.7 teraflops of FP32 matrix operations great for high precision machine learning and training. It also produces over 380 teraflops of peak FP16 and bfloat16 performance 20% more than the A100. And for data-intensive applications, MI200 OEM has an industry-leading 128 gigabytes of HBM2 memory as Lisa said before, that has a staggering 3.2 terabytes a second of total bandwidth. Put it all together, the MI200 is showing incredible performance on [ HPC ] benchmarks and science applications. [ AMD and HPL ] benchmarks about 3x higher than the competition, about twice the performance across a range of HPC research applications like Open [ MM, PAC and LSMS ]. The MI200 is delivering the fastest application performance ever seen. Now let's look at another important research application in this time in molecular dynamics. Climate change brought on by greenhouse gas emissions is one of today's most pressing problems. To create more efficient combustion engines and fuels scientists use high-performance computing to run simulations at the molecular level to demonstrate the performance of the MI200, let's look at a combustion simulation of [indiscernible] molecule using LAMMPS. LAMMPS is an open source molecular dynamics code widely used by researchers all over the world. On your left, we're running LAMMPS on 4 NVIDIA A100 SXMs. On the right, in the same simulation on 4 MI200 OEMs. This is a simulation of a fuel rapidly expanding after detonation. It's about 20 million atoms and captures the first nanosecond as the chemical bonds begin to break. This typically takes days to complete. Obviously, we've time lapsed or simulation here. The MI200 completes the simulation before the A100 completed half. What does that mean? Scientists typically run hundreds or even thousands of these simulations to gain insights on new fuel alternatives or engine designs. With the MI200, the time to analyze new compounds is cut by more than half potentially reducing the characterization time for months and weeks. This will dramatically accelerate the discoveries that reduce our emissions and carbon footprint globally. Now we need to scale that performance to exascale. Our third-generation Infinity architecture is the key foundational building block. The Infinity architecture provides high-speed interconnects unifying the CPU and all the GPUs in the node to deliver up to 800 gigabytes a second of aggregate bandwidth. It also unifies the CPU and GPU memory with coherent connectivity and reducing data movement and simplifying memory management. This dramatically increases developer productivity and streamlines the programmability of GPUs. Finally, this unified architecture provides a critical leap in making it easier to accelerate legacy CPU, GPU codes to more quickly tap into the power of the MI200. Going even further scientists are incorporating AI techniques now into HPC workloads to further accelerate data-driven research. AMD provides an open-source GPU compute platform, ROCm that supports all of the major machine learning frameworks. That means developers can use the most popular AI frameworks on all of our instinct accelerators, including the MI200. And with ROCm 5.0 MI200 will have the key ML models optimized, including [ ResNet, Bert and DLRM ]. Now that we've showcased MI200 performance and capabilities, let's hear from one of our first customers on what it all means and not just any customer. I would like to invite Thomas Zacharia from Oak Ridge National Labs to share more about the beginnings of the exascale era with the MI200-powered Frontier.
Unknown Attendee
attendeeI'm Thomas Zacharia, the Director of Oak Ridge National Laboratory. Thank you for letting me be part of this amazing announcement of your GPU and CPUs, which is driving and powering America's first exascale supercomputer Frontier. Now Frontier is going to be this amazing machine, an amazing scientific tool that is going to allow the dreams of many scientists from the world over to be realized because they have this powerful tool that will allow them to calculate and simulate important challenges. As we think about the most compelling challenges facing our generation is about energy transitions, it is climate change. And issues that we are currently facing as a society tackling the pandemic. And Frontier is going to allow us to tackle these important challenges using the capability of the machine driven and powered by the AMD processes, which makes the MI200 the most powerful processor that has ever been made available to the scientists. A single GPU, is more powerful than the entire node of Summit, which is currently the fastest supercomputer in the United States. AMD has gone out of their way to make this a very efficient [ processors ]. Therefore, it makes Frontier a very efficient supercomputer. And so MI200 is a culmination of a deep-seated partnership between AMD, Oak Ridge National Laboratory, and Frontier is a partnership between AMD Oak Ridge National Laboratory and HPE. But these important national challenges could not be achieved with that commitment at the top. We are installing the supercomputer as we speak, and we are excited to make this supercomputer available to our scientists and engineers early next year. What Frontier is going to do is to accelerate science and scientific discovery so that we can continue to tackle the important challenges facing [indiscernible].
Forrest Norrod
executiveThanks, Thomas, for sharing your journey with us. All of us at AMD are extremely proud to be part of the efforts to bring Frontier to bring the exascale era to life. On behalf of the engineering and support teams across AMD as well as our strategic partners, the Cray team at Hewlett Backard Enterprise, I'd now like to show you Frontier. Here is America's first exascale system, Frontier, powered by AMD EPYC processors and MI200s. Frontier is currently being installed at Oak Ridge National Labs and will be coming online very soon and open to scientists next year. At AMD, we're proud to power the largest supercomputers and with a growing list of partners supporting the MI200 customers at all skills, we'll be able to choose from a range of platforms and solutions to fit their unique needs. AMD is on a journey in accelerated computing. We will make the right engines available to accelerate targeted workloads. We will make them easier to use, and we will help solve some of the world's most challenging problems faster. We look forward to sharing more with you as we continue to push the boundaries in data center computing, making the best even better. Now let me welcome Lisa back.
Lisa Su
executiveThank you, Forrest. I have a couple more updates for you today. Now you've seen this road map before -- This is our server CPU road map that we've shown you for the last couple of years. We have executed very well to this road map, delivering Naples, Rome and Milan to market on time and exceeding product expectations. The adoption of Milan has significantly outpaced Rome as our momentum builds. Today, I'm happy to provide an update on Genoa, which will be our flagship fourth-gen EPYC server processor. The engine driving Genoa is our next-generation high-performance core called Zen 4, built an industry-leading 5-nanometer process technology. 5-nanometer is doing extremely well. We've worked with TSMC to optimize 5-nanometer for high-performance computing, and it offers twice the density, twice the power efficiency and [ 1.25x ] the performance of the 7-nanometer process we're using in today's products. Genoa will be our first server CPU using that Zen 4 core in 5-nanometer. And when introduced, we expect Genoa will be the world's highest-performance processor for general-purpose computing. It's designed to excel across a broad range of data center workloads from enterprise to HPC to the public cloud. Genoa will extend our performance leadership, both at the socket level and the per core level with up to 96 Zen 4 cores. And it supports next-generation DDR5 and PCI Express Gen 5 memory and I/O technologies, combining next-gen platform capabilities that fully complement the new ZEN 4 core. The Genoa platform also includes support for the new CXL interface, and we'll have breakthrough memory expansion capabilities for data center applications. And I'm happy to tell you that Genoa looks great. We're now sampling to customers and on track for 2022 production and launch. Finally, let's turn to the cloud. Cloud native workloads are a fast-growing class of applications that are developed, deployed and updated rapidly. These applications typically are very throughput oriented and they can take advantage of a high number of threads. We have created a new version of Zen 4, specifically for cloud-native computing and we call that core Zen 4 C. It's fully software compatible with Zen 4 with specific cloud enhancements, including a new density optimized cash hierarchy to enable additional higher core count configurations for cloud-native workloads that benefit from maximum thread density. And it also includes significantly improved power efficiency and breakthrough performance per socket. We're bringing the Zen 4 C core to market with Bergamo, our new cloud-native server processor. Bergamo is a high core count, power-efficient CPU, purpose-built for cloud native applications. It offers up to 128 high-performance Zen 4 C cores to deliver breakthrough performance and power efficiency for cloud-native workloads. And it comes with all the same features as Genoa, including DDR5, PCIe Gen 5, CXL and the full suite of Infinity Guard security features. Bergamo is also socket compatible with Genoa with the same Zen 4 instruction set and can be deployed on the same platforms that our customers and partners are qualifying now. We're on track to ship Bergamo in the first half of 2023. So now you see the new and expanded AMD EPYC CPU road map. Our investment in multigenerational CPU core road maps, combined with advanced process and packaging technology enables us to deliver leadership across general purpose, technical computing and cloud workloads. We're extremely excited about the value that our next-generation EPYC processors will deliver and look forward to bringing them to market. To wrap up, our CPU, GPU and process and packaging innovations are enabling AMD to deliver leadership performance across the data center. As we come to the end of our time together today, I hope you now see why we're so excited about our vision and plans for the accelerated data center. You can count on us to continue to push the envelope in high-performance computing. Thank you for joining us today.
This call discussed
For developers and AI pipelines
Programmatic access to Advanced Micro Devices, Inc. earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.