Synopsys, Inc. (SNPS) Earnings Call Transcript & Summary
April 20, 2021
Earnings Call Speaker Segments
Aakash Jani
analystOur second presentation and final talk for the session is going to be led by Paul Stravers, who is the principal R&D -- principal engineer in the R&D division at Synopsys. Paul has a broad range of experience with expertise in compute architecture, multicore application design and hardware/software codesign. And believe me, that's just scratching the surface. So today, he's going to be discussing Synopsys and ARC processors, which scale across many verticals and deliver unyielding performance. Paul, if you're here, you want to start things off?
Paul Stravers
executiveYes. Let me share...
Aakash Jani
analystOkay. And I'll just be here until you get your screen up and going, and then I'll shrink into the background. That's good.
Paul Stravers
executiveOkay. Thank you. Then -- well, I don't have to introduce myself. Thank you, Aakash, for doing that.
Aakash Jani
analystYour camera is -- I don't think your camera is on yet.
Paul Stravers
executiveThat's a good point.
Aakash Jani
analystI'm sorry about that. There you go. I can see you now.
Paul Stravers
executiveWelcome. Good morning. You already know who I am, thanks to a nice introduction by Aakash. Let's start. I'll be talking about how to meet your increasing processor performance requirements for high-end embedded applications. So let's start with those applications. Just showing you 4 things. You can probably think about more. We have the SSD stage, rapidly increasing drive capacity, networking, getting into the home, etch, cloud usage is growing, artificial intelligence, requiring very high net throughput and very high memory bandwidth, and there's wireless where you typically see that you need a very balanced performance area and power profile. So if you have all these applications, then you can make a case for application-specific acceleration. And why is that? So we're looking at a couple of parts. One of them is the limits to the instruction-level parallelism that we have and task-level parallelism. Also the attainable clock frequencies have been leveling off for a long time. And there is efficiency. So each instruction incurs overhead from fetch, decode, dispatch, speculation, pre-fetching, what have you. So we're looking for a solution. And you might think that we could do maybe a little bit more useful work per instruction, for example, because you send the instructions that operate on many data items in parallel. And you would then have still standard operations, but the instruction overhead would be amortized over the entire vector. You could also introduce application-specific instruction set extensions, and these are custom instructions that are targeting your application. You can think of those acting like 10 standard instructions. We call that APEX. And you could even go further and design a complete application-specific instruction set processor, which then targets a particular application. We call that flow the ASIC design flow. Now you could even do another step, go even further and integrate application-specific hardware accelerators in your system. So this is full custom hardware that's optimized for a particular algorithm or class of algorithms. It does not include any instruction, so you don't have any overhead from fetch, decodes and the things we saw above. But it does require efficient interfacing with software and other hardware to accelerate. So we're looking here at the whole balance, a heterogeneous ballot of processor solutions. And probably, you want to have a little bit of everything or a little bit of some. So how do you integrate these things? So here, we have an example with ARCv3 cluster. They're here to upper left. You see that you can integrate up to 12 cores into a single cluster. And those cores can be general-purpose cores that include caches and possibly also the APEX custom instructions. We don't have all 12 to be the same. You could say maybe 4 of one type, 8 of another type or just use fewer. And then the cluster provides accelerated target ports, and those target ports can be used to connect your dedicated hardware to accelerate that executes exactly what you need for your application. Now that accelerates and we actually have its own internal SRAM for storage of data for that. But before we get there, you might also have I/O devices. You can also connect them to the cluster. But at some point, you may want to read or write data that's also written here in this closely coupled memory that belongs to the accelerator. So for that, the cluster needs to provide initiated ports with targeted things. You can also target a boot ROM or some other memories. First, of course, more than just these local memories, there is DDR. And it could be a network on chip, which provides even further memories. You want to access those as well. Peripheral network with all your peripherals, you need to access those trays, and there is introduction of power management. In middle, you see that this cluster provides shared cache, shared memory. And this is banked so that we can have multiple concurrent accesses to the cache, to the memory, and we'll get to that in the next slide actually. So what we see, that ARCv3 cluster is all about scalability. So we saw 12 cores, 16 custom accelerators can be connected. This is surely enough to win multiple software and hardware tasks concurrently, and we have quality of surface to isolate those tasks from each other. There are multiple clock names per core and per accelerator to simplify integration and physical design efforts. The bandwidth is scaling up into the terabyte per second region, which is enough to support high-performance accelerators and deep learning networks. Connections are configurable individually from narrow like 32-bit all the way to 512-bit wide, fiscal average up to 52 bits. But to 4 DDR connections, as wide as 512-bit wide, with and without into the 128 outstanding transactions, that's definitely enough to saturate even very long latency memory channels. I then shared L2 cache and memory provides banking with up to 32 banks, and they could be up to 512-bit wide and very large in size. But you can also scale down. Sometimes -- actually, we have quite a few customers who are looking for low-cost and low-power products and solutions. So you could just create a cluster that has only 2 cores with only 2 data banks, narrow, one narrow connection to the DDR. That's also possible. It's the same product, but you see it targets a wide range of options in terms of scalability. What is important is that the architecture focuses on reduced physical design efforts for the customer. For example, a single global high-speed data network carries all customer-level data communications and still with bandwidth guarantees to implement full task isolation of shared physical connections. And those connections can actually be of several kinds or synchronous or asynchronous [ fivefolds ], synchronous or any combination thereof. So this means that any concerns you might have in your back end can always be matched with some configuration option in this cluster. Now with all those cores sitting there in that cluster, you might ask yourselves, how do you effectively use those, of course. So here on the right side, you see the blue curve, which is not a very good curve. Because as you add cores, the horizontal axis, you see that performance might increase up to some point and then it doesn't really. It may even come worse. So well, let's call this a death curve. We want to be on the good curve, the purple curve. And the question is how do you get on the good curve? First of all, your application has got to be parallelized. And fortunately, for significant portions of this SSD network, AI and wireless workloads, this is actually the case. So you can partition the work over many independent tasks that can vary concurrently on variable processors. However, never forget, Amdahl 5 decades ago formulated a law that says that your speedup will always be limited by a serial portion of your program. So you got 5% of your app is not parallelizable. Then you cannot hope for any speedup beyond 20x. Synchronization overheads got to be low. If it's not, then you might end up on the blue curve. This typically points to, for example, a shared resource, typically synchronization resource that gets shared with too many cores or latency goes up as the number of cores goes up. So then actually, performance goes down. You need [ semiforced ] mailboxes, all very efficiently. You're not only synchronizing among tasks. You're also swapping data. So the communication bandwidth has got to be sufficient, and the memory and interconnect architectures are key here. One funny fact is that in some cases, you can get super [indiscernible]. And that's, of course, great. It doesn't happen too often. But when it does, it's usually because your software working set, some starts fitting in the aggregate of one cache. And if you happen to see it you're really lucky because you get even more performance as you add cores than you would have expected. Okay. So this processor, HS68, that we're introducing is a 64-bit processor that runs the ARCv3 instruction set, and it runs the ARC 64 version of ARCv3 set. So this means that you have 64-bit data, and you can still have custom instructions that you can add using our tools. Pipeline is 10 stages. It provides full free load at compare branch. So we have very efficient control code execution. Memory architecture, we've seen some of that, one d-cache, i-cache. The closely coupled memories are both available for data and instructions. So this gives you real time ways or, let's say, ways of dealing with real time constraints because you have very predictable access to your CCM. This is shared L2, and again, there's also a shape memory for realtime support. We issued 2 instructions per cycle with out-of-order completion. We have over 6 core marks per megahertz with 3 gigahertz clock frequency on typical conditions, 12 cores plus 16 accelerators, up to 52 bits fiscal aerospace. 128 bits floating points in the mini-pad sizes, hardware managed [ DLG ], Linux and [indiscernible] support. And as you see on the right side, 128-bit load stores that can be issued back to back even if they're underlying. Streaming reason rights for data that has no temporal locality [indiscernible] and in memory atomic for collision-free task collaboration. So we see this is not just a lot of benefit, but there's also scalable multi-core features in here. Let's look at the ASIC, right? So we saw what an R core means if you integrate it in the system. Now with ASIC, we need to design it because it's not there. So you started an architecture, as you can see in the upper right corner, and an algorithm. The architecture is an initial guess at what you think would be good fit for development. Feed it into ASIC designer, it will generate tools. You can then compile firmware and simulate it, create by ASIC designer, create a profile. You look at it. You get ideas, right? You get ideas to improve your architecture, and maybe it's a 2-year algorithm. And you don't get another tool chain, new firmware that runs on a new simulator and some more ideas on how to refine it. So you go a couple times around. At some point, you generate your RTL. You have an ASIC that you're happy with. And including this firmware, it will handle the class of algorithms very well, not just the algorithm that you just fine-tune it for, but also some variations there. You can have field upgrades if you put your firmware in MCM. And you power an area, no longer includes a lot of overheads from instruction fetch and decode, just a little bit [indiscernible], which is really efficient. So you have it balanced between the flexibility of the processor and efficiency of hardware accelerators. And putting that in perspective here on the graph or if efficiency is horizontal, flexibility vertically, you see that you have a whole choice in your cluster of components to either optimize your efficiency or go for more flexibility and anything in between. So ARCv3 offers a choice of hardware accelerator. So we have the APEX on the left, which is custom [indiscernible] that you can integrate into the pipeline using our tools. It has access directly to the processor registers. But you can't use load store -- or you need to have load store instructions to connect you to the memory. On the right side, we see closely a couple of accelerators. And there, you have your own hardware that executes your algorithm and accesses shared memory. This can be high benefits, low latency and either [indiscernible]. You still can include your own CCM, as you've seen before, and that CCM can then be met in global aerospace of the cluster. So it can be accessed by other cores. So which one should you use? Use APEX if you have fine grain interaction with software, something in the order of 10 instructions that you want to combine in a single APEX instruction. And then typically, you operate on scaler data. So for example, a Butterfly instruction or AES encryption round would be good candidates for APEX. If you want to do closely coupled accelerator, you probably have course grain interaction itself there. And those accelerators are able to execute complete algorithms, the drone analyst computations and memory accesses. And typically, they operate on that type of data. So an example would be a wide [indiscernible] agent form an engine or a convolution on neural network accelerator. For example, you see here an ARCv3 cluster that has 2 types of ARC processes. It has, as you can see, 4 back-end processors that includes APEX accelerators that have instructions, basically Ionis instructions such as to control NAND flash with very low latency and powerful interaction with the NAND. And then you have a set of front-end processes, which might get larger caches and maybe MMUs that you may not want on this side or some slightly different configuration that is optimized for the front-end task. And then there's still shared Level 2 cache shared memory that open access and also a couple of dedicated hardware accelerators connected, but they also have the private connections. And all of this is possible with the ARCv3 cluster. Let's look at an example. In this example, I'm going to show you how data moves through the cluster in what you would call a hardware/software co-design exercise. So we have here a processor. Let's take core #5. And it produces data that needs to be consumed on this as ASIC. This ASIC then needs data and needs access. But the thing about ASIC is that they are most efficient if they can access data that is in their local CCM as to with latency and predictability. And so that would be what we want. We have 2 white boxes here in the lower left and middle bottom where we show what is happening on the hard core #5 here and on ASIC #1 here. Okay. Let's start. So the first thing that happens on core #5 is it executes a function. We will call it acquire room. That specifies a remote CCM memory and size. And the effect of this function is that it allocates a buffer in this remote CCM of the requested size. And we turn the pointer to this software [ winning ] here. And if you do reference that pointer, the customer will know how to route the data to this profile. So let's see. So now this processor starts executing a loop. For example, it's called filter step. And the data that we term by filter step, he then puts in P or offset I and then the increment I. So you see here a series of samples from the filter are -- is produced. And because it de-refences peak, it was obtained here with acquire room, call it, the cluster will know how to forward the data to exactly that buffer. So this buffer starts filling with data. At some point, the producer here is our core, #5, is ready, and it releases the data. Now the system do understand that the data is available for consumption. In the meantime, this ASIC was doing other stuff, but at some point, it arise at a piece of code where it says acquired data of a particular size. And it will return that same point of P, which we'll be pointing into its own local CCM. So it can now access, as you see here, the pink arrow. And as you can see, it's indeed a low-latency, high-bandwidth connection between the data that was produced here on this far [ HS5 ] core, not far, still in the same cluster, but let's call it remote processor. And you can have a low latency and high density. At some point you're done, you can move [indiscernible]. Applications are requiring ever more performance as we've seen in several more application areas. We looked at SSD networking, AI. There's many more. Performance can be scaled along several angles. The architect can deploy multicore in the APEX, ASIC, hardware accelerators. It's all about improving your architecture. So what we have here is the cluster that can scale to exceptional performance levels. And if you use [indiscernible] cluster, you have the best of those products. So optimal power performance in area. And if you map your application smartly, you'll have the efficiency of hardware and the flexibility of -- including field of facts. Thank you.
Aakash Jani
analystThank you. That was -- I have to say it brought me back to like my computer architecture class when you're like bringing in Amdahl's law, you're bringing in the walk-through of the processors. I appreciate it. It's really well explained. So I just -- I wanted to kind of start off with a question from the audience, if that's okay. Do you mind minimizing your presentation so you can take up the full screen? Perfect.
Paul Stravers
executiveYes.
Aakash Jani
analystSo the first question comes about the ARCv3 platform. Is it ASIL B or ASIL D compliant?
Paul Stravers
executiveYes. So we have an automotive edition. And that indeed includes ASIL B and ASIL C, even ASIL D.
Aakash Jani
analystOkay. And kind of in that same vein then, with out-of-order execution -- or with out-of-order execution, what do your customers kind of do for deterministic -- part of deterministic computing?
Paul Stravers
executiveWell, you see most of the -- unless it's out-of-order completion, right? And in fact -- But out-of-order completion does -- it actually helps you with getting more predictable outcome. And that is because you don't have to install your pipeline whenever a cache miss happens, right? So you have more ways of continuing the computation even if a cache miss is outstanding. So it's less jumpy, your execution stream. What we do -- so if hard real-time or soft real-time constraints are in play, then we do advise that you use either the closely coupled memories that are the ARC processor. You can use it for instructions or for data or both. And -- but even if you don't, we still have cache in your Level 1. You then still have a shared memory that is completely software managed and therefore very predictable in the cluster. Okay. SRAM that we have in the question, I mentioned you can have up to 60 megabytes of SRAM there. You can dynamically repartition them and assign into either shared cache or shared memory. So if you have an application that evolves over time and there are periods where you need more shared memory, that's okay. You can just repartition it on the fly and continue the execution.
Aakash Jani
analystOkay. Then kind of another question was -- it's kind of discussing your memory subsystem. I see that you guys are using a snoop filter. So as you start to scale across more and more cores, when does it make sense to move from a snoop filter to more of a directory-based system?
Paul Stravers
executiveI think the moment that you go through multiple clusters, right? So we found that one course in a cluster with the support of the sort of snoop filters that we have is a good point. There are applications that let themselves scale to 6 more cores. But if you look at a really broad range of applications, it's like a sweet spot, you have the -- basically the benefits of closeness and deep integration. You have this low latency interconnect between the cores. But at some point, as you are hinting at, the cost of snooping becomes too high. And then you basically instantiate another cluster, and you can use a network on chip that includes directory protocols or other smart algorithms to deal with more global [indiscernible].
Aakash Jani
analystOkay. And then my next question was kind of for the ASIC. One of the kind of like positive points you were discussing was it allows for field upgrades. Could you kind of discuss a little bit more about what you mean by like allowing for field upgrades?
Paul Stravers
executiveYes. So what often happens in the context of security but also maybe feature improvements is that you have a product out in the field and the necessity arises that you need to change your firmware. So if the firmware is actually determining all the details of the function, which is exactly the case with an ASIC, then if you put that firmware either in some nonvolatile memory or maybe even disk array or depending on whatever device we're talking, then there is some sort of connectivity. You could have over-the-air upgrades of your firmware, but the security problems or whatever it was that you try to address get fixed.
Aakash Jani
analystOkay. And then kind of this is now moving to kind of closely coupled accelerators. Is that something Synopsys offers like a sterile IP? Or is that external IP from like the customers itself? Can you kind of discuss about like closely coupled accelerators a little bit more?
Paul Stravers
executiveWell, in principle, you can connect anything that talks industry standard protocols like [indiscernible]. So it doesn't really matter what it is. Of course, we do have in our own portfolio quite some devices that answer to that requirement. So we have [indiscernible] compatible devices. You can just connect those. But I can imagine that more advanced customers want to design their own hardware accelerators possibly because that is a differentiator for them in their market. That's where they want to shine. And that's where they put their deep algorithmic knowledge or whatever the customer doesn't have, right, or that the competitors don't have, whereas, let's say -- and general-purpose process typically is not considered to be so very differentiated.
Aakash Jani
analystSo kind of -- so kind of in that point then, just to like sum it up. Regardless of the IP, as long as it's working through like one of the industry standard communication protocols, coupled with you guys a solution, it would automatically become a closely coupled accelerator, right?
Paul Stravers
executiveRight. That's it. It's a nice work, but there's nothing to add to it.
Aakash Jani
analystOkay. That's cool. That's pretty nifty feature. Well, I think we're kind of -- we're running out of time now as far as Q&A kind of goes. So I just want to kind of throw this out there before time really runs out. We -- you're going to be joining us later today for the breakout session, correct?
Paul Stravers
executiveYes. 12:40, yes.
Aakash Jani
analystYes. So yes, 12:40, 11:40 Pacific. Yes. I'm in Central Time. So I feel you. So you'll be joining us for the breakout session, and I urge everybody, please come sign up because we'll be able to get that personal one-on-one and maybe a little bit more setting to kind of ask some of those questions like the type of instructions, what our customers kind of doing with their ASICs and kind of understand a little bit more about the our platform versus just the individual components.
Paul Stravers
executiveYes.
Aakash Jani
analystAnd yes, do you want -- do you have anything else you want to kind of throw in for the breakout session?
Paul Stravers
executiveNo, you're welcome. I think it's going to be interesting. We're going to discuss anything you want to discuss regarding Synopsys. So if you have any other questions that go beyond what we discussed here, you're all welcome.
Aakash Jani
analystI want to break up my architecture book and come with questions, too. I'm going to be there. And you inspired me with Amdahl's law and made me think of Gustafson's law with weak scaling. So I'm going to come with questions.
Paul Stravers
executiveAll right.
Aakash Jani
analystSo thank you for coming. Thank you for taking time out of your day, and I appreciate you putting together this presentation. It was really well done. It was informative. It was easy to follow. And again, thank you, thank you, thank you for speaking at our conference. So actually, this talk concludes our embedded SoC design session for today. I just want to kind of make sure that everybody remembers we're going to be having breakout sessions that start at 11:40 a.m. PST. Synopsys is going to be there. Like I said, Paul Stravers is going to be there. So please come sign up, kind of get to know these vendors a little bit more intimately, understand their product. And now we're going to be taking a 10-minute break and resume again at 9:40 PST. And in the meantime, please enjoy the content, which is from our sponsor, which I believe is actually ARM today.
Linley Gwennap
analystAll right. Welcome, everybody, to the Synopsys breakout session here at Linley. A couple of the ground rules. So basically, I'll be kind of conducting an interview with our R&D folks here. If you have any questions about either the talk that Paul Stravers did yesterday or any other aspects of processors at Synopsys or portfolio, whatever, feel free to put those in the chat. And then we'll answer them as we go. That makes sense? All right. Let me pull up the chat. Okay. All right. And then I guess if people show up, then I will go ahead and admit them as they come. All right. Okay. So let me start out introducing our guests here. So I have Kulbhushan Kalra. He's one of our directors of R&D focused on our high-performance art processors. And I have Carlos Basto, who's one of our high-performance processor architects. Unfortunately, Paul Stravers, who did the presentation yesterday, was not able to attend today. But Carlos can certainly fill any answer -- any specific questions that you might have on the -- in particular, on our high-performance processors. Okay. Like I said, feel free to put those questions into the chat, and we'll either answer them live on the chat here or if it's something that we need to get back to you, we'll get your information and get back to you.
Linley Gwennap
analystAll right. Well, let me start off with a couple of questions for Kulbhushan. Like Paul covered in his session yesterday, embedded applications are definitely demanding more and more single core performance. How does the ARCv3 architecture that you've recently announced and are now delivering with lead customers, including a cluster helped with customers meeting those performance needs?
Kulbhushan Kalra
executiveThat's good question here. That's indeed one of the strengths of ARCv3. We have seen that for quite a few years. We have been increasing the single core performance, but there is a limit to how much a single core can deliver in terms of performance versus what embedded applications are needed. The requirements in many embedded spaces like storage network are increasing quite significantly. That's how we designed ARCv3 for. The branding is focused on both improving single core performance at the best-in-class efficiency for performance as is a multicore scalability for a high real-time performance. I think Paul explained yesterday in his presentation, the ARCv3 cluster interconnect, brings many of these benefits to help with these multicore scalability, get with optimal area in power, the highest performance possible. A few key things lying close integration of the hardware accelerators that allows the user all the cluster bandwidth and the resources that I put in the CPU clusters here also by the hardware accelerators, increasing the number of cores who can share those resources plus to interconnect in the bandwidth up to 12 cores, while still allowing real-time performance with features like quality of service, not have one core or one hardware accelerator hogging all the resources here and define that into different domains. And also, allowing heterogeneous CPUs not every -- there needs to be a homogeneous CPU the different applications that you can now combine into a single cluster.
Linley Gwennap
analystAll right. Great. You mentioned a couple of applications. Are there specific embedded applications that are well suited to this type of architecture?
Kulbhushan Kalra
executiveA few of the key ones, storage is one. Wireless is another one. Storage capacity needs are increasing quite exponentially over the last few years here. Same on the wireless side, 5G applications, significantly higher data rates than what 4G and others supported. And that requires much higher performance than any single core can provide allowing [indiscernible] execution across multiple cores while still being able to share the bandwidth in the cluster resources.
Linley Gwennap
analystOkay. Makes sense. Here, let me go with a couple of questions on the chat. I attended Paul's session yesterday, and I would like to know a bit more about the ARC extensions you discussed, specifically how do these custom instructions integrate with the Synopsys tools? And is it easy to implement this in this complex cluster?
Kulbhushan Kalra
executiveSo this question is about our CapEx extensions, right?
Linley Gwennap
analystYes, I think so, yes. Paul covered APEX and ASIC, right, as extension options.
Kulbhushan Kalra
executiveYes. So that's been one of the strengths of ARC, the extensibility to the FX extensions. These are closely a couple of instructions in each core. We deploy at a full automated way of customers or users implementing these extensions that automatically connect to a compiler and the debugger when you build a ARC core. Our compilers will recognize once such extensions are defined by the user.
Linley Gwennap
analystOkay. And just to add, the tools make it pretty easy, right? We make it pretty straightforward for customers to add those custom instructions.
Kulbhushan Kalra
executiveYes. It's quite seamless once those are defined in the architect tool that we have. All of our tool chains, debugger, compiler and simulators will recognize them. It's not just instructions, adding your own registers extending the core registers or aux control registers, et cetera, partly couple of accelerators. It's all seamless.
Linley Gwennap
analystMakes sense. All right. All right. Back to our discussion. So obviously, one of the major focal points that Paul talked about yesterday was our 64-bit cores, the HS6x family. Are you seeing, Kulbhushan or Carlos, that this 64 bit is becoming more and more a requirement in embedded applications?
Kulbhushan Kalra
executiveYes, definitely. It's not just computed requirements that are increasing here. I mean in varied applications, you see the DDRs, 4 gigs are not enough for many of these complex SoCs with multiple cores and high storage capacity to handle. We can do that with 32-bit processors with the physical address extensions. Many of the processors have done it in the past here, but being it seamlessly in a native 64-bit core, that allows you a real-time application accessing a larger than a little physical interest space is becoming a common requirement now.
Linley Gwennap
analystOkay. Makes sense. All right. Another question here. If I have my own hardware accelerator, does Synopsys provide support for integrating this into your ARCv3 cluster? Or is it simply a matter of supporting standard ACE-Lite interfaces?
Carlos Basto
executiveYes. Perhaps I can take that. So that is not extended ACE-Lite interface that connects to the closer -- in the closer handles. The coherency aspects of making sure that the accelerator is coherent with the caches of the ARC cores connected to the as well as the share level to cache that exists in acoustics.
Linley Gwennap
analystOkay. Makes sense. So I assume then that -- it's probably a pretty straightforward proposition, right, to get -- to basically implement those accelerators into the architecture?
Kulbhushan Kalra
executiveSo the hardware integration is indeed standardized with ACE-Lite interfaces, plus we do help with the softer side in terms of streaming examples of how to program, share the resources in the cluster for these hardware accelerators. And those are streaming example that Paul showed in his presentation yesterday.
Linley Gwennap
analystOkay. All right. Good. Another question. You touched on the SIM DFPU and the HS68 summary from the presentation yesterday. Can you elaborate a bit on the performance we could expect, data types, et cetera?
Kulbhushan Kalra
executiveCarlos, do you want to take that?
Carlos Basto
executiveSure. So our 128 bits in DFPU, in terms of data files, we making support of half precision, single precision and data precision. And if you do the math, that engine is capable of sustaining 8 half precisions either point of operations per cycle because it's long been it retired or 4 half single precisions operations per cycle or 2 data precision operations per cycle. It's important to note that the whole story that feeds data into that IPO, is also extended to go 128 bit wide and therefore, keeping that in the DPF unit busy to sustain that level of throughput.
Linley Gwennap
analystOkay. Makes sense. I'm putting this on the chat. Just to remind everyone, if you have any additional questions, go ahead and just put them in chat. All right. So let's go back to our discussion. So just changing gears a bit, Kulbhushan. We've been mostly focusing on performance and applications that are mostly risk with DSP kind of tailored to our ARC HS and potentially EM processors. And we also just recently released a new family of vector DSPs as well. What additional applications would you say that we're now addressing now that you've released this VPX family?
Kulbhushan Kalra
executiveYes. So VPX family, this is a VLIW architecture here with vector width up to 512 bit. This addresses growing the floating point competition needs much beyond what a ARC HS type or 6x type core can do with SIM DFPU, can have multiple floating point units up to 512 bit wide vectors, especially targeting like LiDAR and LiDAR applications there. There's a messy parallel execution of floating point data is needed. In addition, also LiDAR and LiDAR applications also requires ISO 26262 compliance here. So VPX comes with full SMB and B compliance, including the safety-certified software.
Linley Gwennap
analystOkay. And that kind of leads me to my next question. We've obviously invested a lot in building ISO 26262-compliant IP. What are some of the hardware features and automotive work products that we provide to our customers that really help them down the line and getting them to market quicker and helping them with their certification goals?
Kulbhushan Kalra
executiveYes. So Synopsys, we have built a fully certified development flow for all IP, including all the ARC processors to protect against systematic faults. We have created many key features to meet the ISO 26262 random fault requirements on the protection of registers, agnostic air injection, windowed watchdog timers, single bit error counters for memory ECC, which are covered by single-bit error correction and double-bit detection, and the safety features that we have that come with a robust documentation that gets delivered as part of our FS cores that saves a significant amount of half years in the customer SoC side, and they need to get to there is in compliance.
Linley Gwennap
analystThat makes sense. And one thing I'll add -- just add to that is, I think that some of the classic semi vendors that have been doing automotive for a long time certainly understand what it takes to deliver and build certified SoCs and end-product solutions. But with the automotive market kind of exploding the way it is, especially in China, there's a lot of newcomers to the market. And I would say that they don't have quite the level of expertise in automotive. And what I found just with what we provide, it's a significant help to their overall certification effort. A, we provide that level of expertise that they may not have in-house. But by us doing -- and basically, generating enough core products that will -- that they can reuse immediately into their SoC-level work products for their certification, they can save on the order of 10 man years in terms of their own certification efforts. So it's significant. And one -- it's one thing I'd like to emphasize because not all processor IT vendors are equal when it comes to that kind of thing. Certainly, there are many IT vendors in the automotive space, but I think what's provided from Synopsys is superior in terms of overall product.
Kulbhushan Kalra
executiveYes. I agree. This can be quite cost prohibitive if the SoC guy has to go to the full certification for every IP he has purchased from scratch. It's already difficult or expensive for the SoC components without any help from IP. I'm getting in a processor IP from a start-up to have serious challenges in getting to ASIC audit completed.
Linley Gwennap
analystExactly, exactly. All right. Let me turn to another question here. AI is a hot topic. Is Synopsys addressing the broad range of AI applications, edge inference, et cetera?
Kulbhushan Kalra
executiveYes. This is indeed a hot topic here. We do have in our portfolio also a new network accelerators that address the AI applications when there's a serious amount of hardware acceleration needed and the ARCv3 cluster that we designed for the ARC HS6 -- HS4, that is indeed designed to couple closely with these hardware accelerators. And you're doing that for internal IP already there. I don't think that was covered in yesterday's presentation here. Or NPUs or the new networks that we have, they closely coupled to this cluster using that HS6, HS4s and the cluster interconnect, to address the AI applications.
Linley Gwennap
analystYes. And what I would add, too, is that's certainly what we're trying to do at the high-end rights. When you think of classic training and neural network acceleration and certainly, like you mentioned, Kulbhushan, our Embedded Vision products that combine vector DSP along with our full range of neural network accelerators to meet a pretty high level of tops required. We're also trying to touch on it at the low end, right? So when you talk AI, it's such a -- if you ask someone what AI is, you'll get 19 different opinions, right, whether it's the small Fitbit device that sits on your wrist all the way up to the data center. We're trying to address it on the low end as well. We're doing quite a bit in terms of adding DSP capability to some of our lower-end coolers like our RPM processors which make them really good for sort of what's now being called, say, AIoT, the low end where you need an always-on processor that can do voice recognition, gesture recognition, basic audio playback, things like that. And so I think we're trying to address both the low end and the high end. The other thing that we're trying to do is kind of unify the software look and feel as well. So we sort of -- all of the sensor flow and graph mapping and things like that that are done at the high end, there's also a sensor for light for micros sort of trying to map some of that to the low end, and we're supporting that with our RTM cores as well. All right. Yes. So I mean, I would say, in general, AIs, I would agree, this is very much a hot topic. I think we're doing a pretty job of trying to address at least the way we see it from an embedded perspective. All right. I see. One more question here. What about security? Is this an area of focus for Synopsys processors?
Kulbhushan Kalra
executiveI think that you can cover that yourself, in the security for all our processors, been years now.
Linley Gwennap
analystRight. So security is another one of those -- and I don't know the context of the question, but it's a very broad topic as well. So Synopsys has -- we've invested a lot not just on the processor side, we have a ton of stand-alone security IP titles as well. So if you needed a standalone crypto accelerator for AES or SHA256, whatever you need, we have dedicated hardware accelerators that can be used pretty much in any system. We also have a very rich library of software, crypto libraries and secure boots and things like that, that are also available. The -- on the processor side, there's a couple of areas. So we have what we call SecureShield. So if you have a standalone processor, then we have mechanisms that we support in our -- in particular, our RTM cores, such that you can use it as a standalone trusted execution environment. Things like a secure MPU so that you can isolate different regions of memory and have different crypto keys for the different regions of memory, locking down the debug interface, locking down the extension interface. Just trying to provide a mechanism so that it makes it difficult for somebody to hack into any of the potential ways in -- so to speak, in the core. We also have a whole line of processors that we call our SEM cores. So they share all of those same security features that I talked about with the standalone processors but go a step further where they also had side channel protection. So we have randomization of the timing, randomization of the power, so we can do differential power analysis attacks very easily. All those things are built into that SEM family of processors. And we're seeing those used in a lot of applications like smart cards, iSIM, now that SIM functionality is being integrated onto the device as opposed to having a separate SIM card, that's becoming more and more the norm, anything that needs high-value data transfers. So anything that involves medical or financial payments, anything like that. We're seeing that core doing very well. And we actually just announced what we call an FS version of the SEM Core 2. We just announced it this week. So that you now have the ability to take that SEM, that side channel protected core and bring it into a certified compliance environment. So if you have an automotive requirement for, say, hardware security module, which we also have products for and you want to have dedicated security block that takes care of the trust execution environment for an entire automotive device, than this SEM 130 FS, which is what we call it, will be a certified compliance solution. So it will meet all of the up to SoB tool to -- for lockstep capability and all of the systematic development flow that we use for all of our functional safety processors we follow for that SEM core as well. So it's kind of the combining the best of both worlds, so to speak, with both safety and security, which we're seeing is very much -- they're very much going hand-in-hand because just like with safety, where you can't just add it later. You don't build an SoC and then say, "Okay, I need to add safety now", or build an FOC and say, "I need to add security now." You really need to architect these things from the ground up from scratch and think about what that architecture needs to look like to make sure you're meeting your high-value security needs as well as your safety goals. And those are the things that go kind of hand-in-hand with a core like the SEM 130 FS. All right. Well, anything else that maybe we can elaborate a little bit. Maybe, Carlos, you can take this one. I know you know Paul's presentation from yesterday very well. I don't know how many people that are on the chat here had a chance to listen to Paul's presentation, but maybe you could summarize maybe a couple of the key points that he brought up in that.
Carlos Basto
executiveSure. I think one of the key points of the talk yesterday was about the scalability of the solution, right? So I think Paul had one slide where the title was ARCv3 Is All About Scalability. And what that means is that no matter if you're designing a system that only needs 1 or 2 or 3 or 4 cores with a certain performance level or you need to scale all the way to 12 cores, we have designed a cluster that is scalable enough to basically fit both profiles, right, a very small, low-power, low-latency efficient cluster with only 2 or 4 processors, all the way to a cluster with many, many processors. And furthermore, many, let's say, hardware accelerators that can have huge bandwidth requirements like a CNA, as an example. So I think that's perhaps the key point of the talk yesterday that I would highlight.
Linley Gwennap
analystYes. No, that's a really good point. And I think at least that's what we're seeing Kulbhushan touched on it kind of at the beginning with just in general, embedded applications moving in this direction. And I think that was largely impetus for us to build. There's sort of multidirectional acceleration or performance you can do, right? So you can do single core performance, and that's certainly something that everybody is always trying to improve on. But we want to kind of go in both directions where we improve single core performance but also make -- create a cluster-based architecture that really makes it easier for our customers to go multicore, and that can be multicore with our processors. It can also -- like you mentioned, Carlos, you want to be able to extend that so that customers can easily add their own hardware accelerators or different processing elements to that cluster because it's becoming more and more, especially when you look at things like AI, where the processing requirements are becoming much more specialized. It's not one-size-fits-all, drop a processor in there, and it takes care of everything. These things become very unique and each vendor has their own sort of unique spin of how they want to approach their own market, SoC, or whatnot. And we're trying to make that as easy as possible.
Carlos Basto
executiveRight.
Linley Gwennap
analystAll right. Well, are there any other questions on the chat? I don't see any other questions. Let's see how many people are on. Can I -- maybe I should just unmute everybody since we don't have that many people, you can just ask your questions. [Operator Instructions] Anyway, so if you didn't want to put it in the chat, I think I enabled everyone so they can unmute. If you have a question, feel free to unmute yourself and go ahead and ask anybody on our panel here. Okay. All right. Well, feel free to go ahead and type in or go ahead and add on the chat. One thing that I didn't talk about, we started to talk about a little bit on -- or some of the other products in the portfolio. In particular, one thing that we kind of touched on, but I don't know that we covered it in too much depth. But we talked a lot about the DFS. One thing that we are finding and it's very, very important, has been able to create kind of a common look and feel. So when you're building functional safety IP, whether that's interface IP or processor IP, you really have to have -- you have to start from the basics in terms of creating a systematic development flow. And one thing that Synopsys has invested quite heavily in over the last, say, 6 or 7 years is developing that systematic flow in having this QMS process that's ISO 9001-compliant such that whether you're licensing a USB controller or you're licensing an ARC processor, the development flow that you can expect as an SoC vendor or even a Tier 1 OEM is going to have the same look and feel, you're going to have the same kinds of work products. You're going to have -- you're going to know that the actual checklist and process flows that we put in place to make sure that we're doing our best to eliminate the ability to have bugs get into either our software or our hardware is minimized. And I think that's important. So when you know that if you're licensing any IP from Synopsys, you're going to get high-quality IP that's going to be suited for whether you have an SoB, SoC requirement or SoD in many cases. And that's something that we've definitely taken the heart as well on the processor side itself. So we touched on the embedded vision processors, where we're combining the scale of core in the vector DSP engine with neural network accelerators, and that's typically going to be used for lots of ADAS types of applications where vision is important, and they need that neural network acceleration. The VPX cores that we recently launched that Kulbhushan talked about that are standalone vector DSP cores typically being used for RADAR, LiDAR applications in automotive. These are -- again, everything that we do has to make sure that we follow a similar look and feel in terms of overall functional safety with the development flow as well as the work products that we deliver as part of the safety package that customers license when they license one of those cores. It's also applicable with our RPM, which we touched on. But our smaller 3-stage pipeline processors typically used for more of the body electronics or as a dedicated safety manager where no other functionality really needs to be layered on top of it and it's focused mainly on the safety management of a larger SoC. Our HS core, certainly, we have quite a few different flavors of HS cores, including functional safety versions of the cores that Paul talked about yesterday where we can have an HS core that can be used for a dedicated safety manager similar to the EM but where higher performance is acquired, maybe that safety manager needs to run Linux or another rich RTOs all the way up to a full application process where I need to have, for instance, a quad-core -- full dual-core lockstep implementation, and we can certainly support that with our HS functional safety processors. Anyway, I'd like to put in a date for that because it's an important aspect of what we've been focusing on across the board on the ARC portfolio. And I think that it gives us a unique approach. We definitely go the extra mile than I think some of our competitors do, and I think that's important. All right. Any other questions? Okay. Anything else that you guys think makes sense for us to discuss or talk about?
Steve Cox
executiveNo, thank you for hosting this breakout session.
Linley Gwennap
analystIt looks like you're saying we're muted again. [Operator Instructions] There we go.
Steve Cox
executiveThis is Steve. Thank you, everyone. So yes, I'm with Synopsys as well. I'm a Processor Specialist. So in particular [indiscernible] in your past volume consecutive part from a sales perspective. And of course, if you'd like to follow up with any of the stuff, we'd be happy to discuss further with you. Do you have any questions or comments that you like to bring up? Okay. And then I noticed there's another attendee that's only showing with us some number. Go ahead.
Linley Gwennap
analystSorry, you're breaking up, Steve, for me anyway.
Steve Cox
executiveSorry [indiscernible]. Can you hear me?
Linley Gwennap
analystI can hear you now. Yes.
Steve Cox
executiveOkay. Not before?
Linley Gwennap
analystI heard it kind of -- you heard the beginning part and then it kind of started to break up in the middle.
Steve Cox
executiveI stopped using my computer to connect the audio. I noticed there's [indiscernible] with Synopsys for the benefit of the other people on the line. They noticed there's so much coming in with a phone number, 6335, et cetera. I don't know if you're able to [indiscernible] in or not, but if you have any questions, please let us know. I don't have anything that [indiscernible] please reach out to us if you have interest in getting more information. Mark and I know [indiscernible] [Operator Instructions] in the past. He's our salesperson in Southern California. So he's a good person to chat with. He can bring me into the conversation as he didn't bring the rest of these guys in. There you go, Mark. You're off the mute now.
Unknown Attendee
attendeeThank you, Steve. Thank you, Steve. Sorry, I didn't hear before. Yes, yes, yes. I will speak with [indiscernible], definitely. Good guy.
Steve Cox
executiveYes. He said the same about you. So -- [indiscernible] your interests or other titles in the ARC pipeline, please let us know, be happy to talk to you about it. So I think your company a little bit over the past few years, I don't know a lot about it, but I know a little bit [indiscernible] was just giving you the update as we were -- as I was listening to the detail.
Unknown Attendee
attendeeThank you for your interest in presentation.
Steve Cox
executiveYes. Yes. Of course.
Linley Gwennap
analystOkay. Yes, I mean we tried to -- obviously, there are several families in our portfolio, we could spend a day on each one. So we try to touch on [indiscernible] here in this discussion. Certainly, in yesterday's Linley presentation, we were focusing a lot on our next-gen ARCv3, in particular, our ARC HS cores. But certainly, any interest that you have in any of the deeply embedded cores or vector DSPs or Vision engines or anything that's in the portfolio, we're happy to do a follow-on presentation. All right.
Kulbhushan Kalra
executiveThe Zoom work today rather than yesterday's. It's a disaster.
Linley Gwennap
analystYes. Unfortunately, it seems like not as many people were able to rejoin today from yesterday, but yes, using the Synopsys.
Kulbhushan Kalra
executiveI think past 2 days does not help.
Linley Gwennap
analystYes, exactly. I don't know if ever she's still on or maybe -- is there -- do you know if Linley provides kind of a generic like e-mail or how do people get in touch with us if they have questions that didn't get addressed in this chat or maybe Hannah, you know?
Kulbhushan Kalra
executiveYes. I can check with them. I -- to be honest, if there's some way that they can have attendees contact us, you can certainly follow-up if anyone's interested. I believe we can follow-up with those who have joined our -- the breakout session yesterday or who had tried to join, so we should be able to follow up with them as well.
Steve Cox
executiveIf there's anyone who [indiscernible] working to contact me directly, my e-mail, so this is Steve Cox, my e-mail is easy, it's [email protected].
Linley Gwennap
analystOkay. Well, I don't have any more questions or any more topics to discuss. So if there are no more questions on the chat or in the session itself, then I think we can close it out.
Carlos Basto
executiveYes. Thanks, everyone, for participating.
Linley Gwennap
analystAll right. Thank you. Talk to you soon.
This call discussed
For developers and AI pipelines
Programmatic access to Synopsys, Inc. earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.