International Business Machines Corporation (IBM) Earnings Call Transcript & Summary

May 21, 2020

New York Stock Exchange US Information Technology IT Services conference_presentation 62 min

Earnings Call Speaker Segments

Jeff Summers;Program Director of Development

executive
#1

Hello, and thank you for joining this IBM Technical University virtual session, proactive hybrid multi-cloud management with AIOps and IBM Z. My name is Jeff Summers. I'm the Program Director of Development with IBM, and I'm joined today by my colleague, Dan Wiegand, Senior Offering Manager. We are both with the IBM Z operational analytics portfolio and work out in the RTP lab in North Carolina. Before getting started, I'd like to mention a few highlights and some key points that we intend to convey in this session. First, as businesses continue the journey through digital transformation, the IT organizations' ability to adapt is critical to success, and IBM Z is a core and critical component. Second, access to data is crucial, getting the right data to the right people at the right time and with the right context. And third, leveraging innovative technologies, such as machine learning, the IBM Z Operations Insight Suite provides the capabilities for success. Let's talk for a minute about the ever-changing IT landscape. It couldn't be more visible than what we are seeing today with impacts of COVID-19 as companies deal with transformation more than ever. User behaviors and priorities have changed overnight, resulting in more frequent and less predictable workloads. As I think about my own situation, 2 months ago, the most important apps to me were my airline and hotel apps to manage my travel and my loyalty programs. Now my most important apps are my grocery store apps, allowing me to order online, schedule deliveries, substitute out-of-stock items and leave tips for the delivery folks and access to apps that allow me to have video chats with my extended family. And I believe that many of these changes will be permanent in many aspects, resulting in more and more demands on IT. So IT operations needs to innovate to deal with the changing demands impacting availability, performance and capacity from both an IT resource and a people resource perspective. Everyone's role in the organization is somehow impacted. For the enterprise that wants to be at the forefront of embracing the challenge, there are 3 key needs that have been identified. We look at these as capabilities that a modern digital enterprise require in order to leverage technology advancements, such as the implementation of machine learning in a practical sense to achieve a successful transformation. First, we need an effective means of managing and evolving hybrid multi-cloud environment that is often fueled or underpinned by IBM Z. Second is the need to drive or maintain operational resiliency. We'll see in this session how we can apply AI techniques to power this. And third, in this complex environment, there is the need to simplify and integrate across domains to remove organizational silos and make IBM Z as open to the enterprise as any other platform. When managing the complexity of the IT environment, there are a number of stakeholders spanning multiple roles and organizations from line of business to enterprise operations to mainframe operations, capacity planners and subject matter experts. All of these stakeholders have an interest and a part to play in ensuring the successful performance and availability of end-to-end hybrid applications. Here, we see the 6 key capabilities delivered by IBM Z Operations Insight Suite that will be the focus of this presentation. We believe these capabilities enable the various stakeholders for success. Data streaming, collecting and distributing operational data across the enterprise. Problem identification, quickly identifying root cause for issues impacting operations. Anomaly detection, understanding what is normal in an environment and then identifying deviations from normal. Performance analysis, evaluating data over time to understand and prevent longer-term performance issues. Capacity forecasting, projecting what capacity requirements will be in the future to be able to make educated decisions affecting capital investments. And finally, cost management, understanding and controlling workload contributions affecting MSU costs. And now I'll hand it over to Dan to take us through the first set of capabilities.

Daniel Wiegand;Senior Offering Manager

executive
#2

Thanks for the intro and introduction, Jeff. I'm going to take you through the first 3 of the 6 capabilities, starting with data streaming. So let's talk a little bit about our data streaming capability provided by IBM Z Common Data Provider. We really can't do any kind of analytics without having access to operational data. And common data provider gives us access to all of the IBM Z operational data in near real-time. It provides access to the largest breadth of IBM SMF types, including some third-party types like ACF secret and tops -- ACF/2 and top secret. In addition to IMS Log, RMF III and other various log types. It lets you build your analytics ecosystem by collecting data once and making that data available to one or more platforms. CDP is very extensible and that it lets you -- allows you to add new data types through configuration. So if there's a missing SMF type that's either from a proprietary or legacy application, you can simply add that through configuration without having to wait for IBM development. One of the key capabilities of common data provider is the advanced filtering capabilities that are built in. Two of the primary ways that we let you filter data is through field-level filtering and record-level filtering. Field-level filtering, if you think of SMF data being very voluminous or large, maybe having 100 or 150 different field types. We let you take down those number of field types to the 10, 20 or 30 fields that you need to do your analytics. Record-level filtering is another way that we go and filter data. We built this as we do with much of our software in partnership with one of our large banking customers here in the U.S., and we do this by providing a set of criteria that we go and select records that we want to stream. So the example use case that our customer brought to us is they wanted to stream CICS 110 records to their Splunk environment. But they didn't want to stream all -- tens of thousands of those records that could generate every second. They wanted to only send the records that were most important to them, the ones that matched a very specific transaction ID for their CICS transactions that map to their key business application. But they didn't want to send all of those transactions. They just wanted to send ones that were long running, either greater than 2 seconds or end in an abend. So 2 very different ways that we can go and filter data, saving customers on network bandwidth or storage or data ingestion charges or really just making sure that we get the right to the right place to do our analytics. And the last consideration that we have with our common data provider is we've really spent a lot of time over the last couple of years focusing on performance. Performance isn't one of those things that we do want and check the box. It's one of those things that we always try to go and innovate and keep finding ways to make the system perform better. And I'm happy to say that 100% of the z/OS run time components are eligible to run on zIIP. And I'll talk a little bit in a couple of slides about some of the significant results that we've achieved in that space. So if we think of our common data providers, starting out as just a collector of Z operational data. SMF and log data, in addition to a streaming API that we've added to allow third-party applications to write to their data streams. We've really seen this solution evolve over time towards really strategic to enabling a lot of what we do around IT ops analytics. Since the original release of common date provider, we've had a number of the broader IBM products really have a need from their customers to stream data through common data provider, things such as performance capacity analytics, which Jeff will talk a little bit more about later in the presentation. z/OS Connect, IMS Connect extensions, NetView, Workload Scheduler, making their data and logs available to be consumed by common data providers, since we do have customers that want to use those really across their entire analytics ecosystem. And what I've seen in talking to a lot our customers is really a need to get very different data to very different solutions. And we can see here that's on the chart, a number of those different solutions that we support. And I've even seen customers do things that aren't even listed on this chart. But 3 of the key things that we support is the IBM Operations Analytics Log Analysis platform, Elastic Stack and Splunk. Those are probably pervasively the 3 most common analytics platforms that I see customers want to send their data too. We've also had other requirements, presenting data to platforms like Sumo Logic, Hadoop, syslog. And we've done that through Logstash and having a third-party plug-in to pull that data from Logstash to bring it into those platforms. And Jeff talked about our anomaly detection capability that we have that I'll talk to a little bit later in this presentation. And we still use our Z Common Data Provider to stream, again, real-time data to Db2, which is our machine learning database. So we use our common data provider really to drive almost everything that we're doing from an IT ops analytics standpoint. So I wanted to take a couple of minutes and quickly talk about some of the significant performance results that our team has had for one of our components that runs on z/OS. So common data provider has 3 key components that run on z/OS. Our System Data Engine, which is responsible for processing all of the SMF data from collection to filtering, in addition to a log folder, which is our log data gatherer. And then our data streamer, which takes data from both of those first 2 components and then streams it out to our analytics platforms. I mentioned that 100% of our components are zIIP eligible, but the System Data Engine really received significant results and went above and beyond and what they've done to really get to being offload processing over to the zIIP processors. And from some of our internal lab testing, we found that 99.79% of that workload was able to run on zIIP. As always, there's the disclaimer at the bottom. So this was done in our internal lab, and your results may vary slightly. But I'll touch on just a couple of the things that we had as part of this test. First and foremost, you have to have the zIIPs available for that workload to run on. We were reading a pretty significant amount of data, roughly 300 gigabytes of data a day from an LPAR, looking at SMF 101 and 110 records, so Db2 and CICS records. We had plenty of memory to run these tests. But one of the other key ones is we did this with the SMF in-memory resource. So just a couple of notes about how we went and set up and ran these tests, but we were able to achieve very significant results when streaming this kind of SMF data. So now that we have access to all of that operational data, let me talk a bit about how we use that data from an analytics standpoint. And I'll start first with our problem identification or really how we examine that operational data in a broader context. So one of the first things we need to look at is how we make that data usable by our analytics solutions that we're sending data to. And this is really where we start talking about the value of what we call curated data. So if you think of a lot of the SMF data that we get, there's a lot of cryptic fields and things that need to be calculated. And one of the things that we do is do a lot of that pre-calculation of very common things ahead of time, so before it even leaves IBM -- the IBM Z platform. So things like CPU utilization, paging rates, I/O rates, all of those things that need to be calculated from various fields of that raw data. The other thing that we try not to do is really overwhelm a lot of the analytics platforms that we could be sending data to. So IBM Z is probably one of the most well-instrumented pieces of technology and generates terabytes and terabytes of operational data in a day. So if we think of -- I'll go back to the CICS example. They're generating thousands or tens of thousands of records for every transaction that gets run. It's an awful lot of data that could really overwhelm not only the ingest or search capabilities of a lot of the target platforms. So we have a way within our common data provider and with our Problem Insights solution to really go and summarize those records. So where we can look at maybe one record per minute instead of thousands over just a second. The other thing that we do is we use our own filtering to pull out really the most important fields that we want to go in and drive those analytics solutions. So really doing a lot with the data before it even leaves the platform, so that we can drive those analytics solutions and really provide the most insight into that data as possible for customers that are going to go and consume that. So one of the first ways that we provide insights into the Z data for really the purpose of problem identification is by delivering a set of dashboards and quick searches, and we do that by delivering native applications across the 3 platforms: the IBM Operations Analytics Log Analysis platform, Elastic Stack and Splunk. And we do that as native applications because we know a lot of customers want to go and have the same look and feel. They want to integrate IBM Z into their existing environments. And from the dashboard side, we provide a set of subsystem dashboards across z/OS, WebSphere, Db2, CICS, networking security, to name a few. And in fact, if you can see the kind of front dashboard there, that's one of our security dashboards that we provide across all 3 platforms. So you can really start to get some quick visuals of the data. So if you look at the pie chart on the left, that's the users that have invalid log-ins into the system. So we can see we have a pretty nice range of those. Everyone forgets their password or types in the wrong set of characters from time to time. But when you put that in context with the pie chart on the right, we can see all of those invalid log-ins are coming from the same terminal. So something is not quite right there. So we can see that maybe someone is trying to gain access to the system that they might not be authorized to do so, which is also why we have other things in this dashboard that show invalid access request maybe for very critical data sets or data resources. So this is just one example of how we provide insights kind of from a security perspective. If we're looking at CICS, we might see some things that are running at a Sysplex, right? So maybe the top number of transactions and what they're doing from a transaction rate or a CPU rate, really just to kind of see what the real operational health is of these systems. So these dashboards provided that they're ready to go and use out-of-the-box. But what we find is a lot of our customers will take pieces of these dashboards and kind of weave them to tell their application or analytics story. So there's a lot of great content and insight here, and we've had some of our customers do very interesting things on how they bring these together. And the kind of other interesting thing is that I've seen a number of our customers really have a hybrid approach on how they do their analytics, where in a lot of cases, they're not just focused on one platform, but they have someone that's using Splunk in one part of the organization, but another group that's in development that's using Elastic Stack. So by being able to stream data to multiple platforms after providing it once and then providing this insight content across a variety of platforms really provide a nice way to go into environments for our customers and provide insight into IBM Z, or they may not have had it otherwise. The other thing from the problem determination side that I don't want to leave out is the quick searches that we have. So as for bringing in a log data into the platform, if we have something that's going wrong, whether it be an MQ channel issue or a kick short on storage or a Db2 lock, we can quickly go and search for those kind of things that are going on within the system to really get down to, again, that root cause identification that we're really trying to describe for across all these different analytics platforms. So the next dashboard that I wanted to cover specifically by itself is our Problem Insights dashboard. This is really our dashboard to surface important messages and anomalies. So you can kind of think of this as an aggregator of all of the potential issues that might be happening on the mainframe. So we can start seeing message that come in, is it a highly critical message? Is it something that might build up over time with a data set allocation issue that happens once every few minutes? But then we started getting large numbers of those to where those -- one of those might happen on occasion. But if we start seeing large of those issues happening, it might point to a broader problem. And what we've done with a lot of these messages is try to really encode a lot of the IBM subject matter expertise into those through the suggested actions that are there. So that when we see these messages occurring on the system, we can go and click on what are the recommended actions from IBM on how we go and respond to these. The nice thing about those set of suggested actions is that they're configurable. So our customers can go and say, "Hey, maybe we have a run book that we use to respond to these kinds of messages." So we can go back and edit the suggested actions and put in the link to the run book that we have to go and respond to that. So it really provides our customers to be able to encode their own knowledge as well into the tool, whether it be extending the -- updating the suggested actions for the already -- messages provided out-of-the-box by IBM or even providing their own sets of messages that they want to go and make operations aware of and providing their own set of recommended actions. The other nice thing is that since all of this log data is coming into these analytics platforms, we can always quickly go out and view the evidence and view the raw messages that are coming in. So we want to go and examine the log messages, that's really what the power of some of these analytics platforms is meant to do. So speaking of analytics platforms, one of the things that I like to do is really take advantage of some of the unique [indiscernible] that the platforms that we're integrating really provide. And one of those ones that we've done, and it's really been through a great partnership that we've had with Splunk is how we bring IBM Z into Splunk IT Service Intelligence. And this is really good from a standpoint of trying to correlate business information with your underlying IT infrastructure. So the example that we came up with Splunk was the glass table that you see here on the screen. And it's really about maybe an online bank that has an online and a mobile tier. And we can see the number of customers and maybe the number of revenue that's being generated from those customers coming in from those different tiers and then how they access the system all the way through the back end IBM Z. So they might have an API layer, they could have a web tier or a middleware tier and then back-ended by all of the transactions that happen on IBM Z. But as different things start to happen within those infrastructure, maybe if we have something going on in the middleware tier that's impacting the online service, we can correlate those business results. So we can see maybe a drop in customers or revenue that are able to access the service and correlate that to what's going on within the underlying IT infrastructure. So the great thing about these glass tables is they let you go and drill down and get into the underlying KPI metrics and even down to the underlying data behind those to really do and drive that correlation and see from not only an IT perspective, but if I'm coming in as an application or a line of business owner, what's going on with my business and what's going on with the IT infrastructure that's helping support that. And I can kind of start to see these little hotspots that pop up, where if I do have issues, I know where to kind of drill into and drive into first. So next, I want to take a look at what we're doing around anomaly detection. And really, this is where we're trying to go to allow our customers to really be much more proactive in how they identify problems and trying to get to where they can do that in advance of their occurrence. So a lot of what we've seen already is if something happens, we want to go and identify root cause. Now let's take a step back and how -- look at how we can use that operational data to really go and do this anomaly detection and really try to be a lot more proactive. So we'll go through a couple of charts, and then I'll get into some key examples on how we're going to go and do that. So the first thing I want to go through is kind of the journey that we're on as we try to be -- embrace machine learning technology and be much more proactive in our operations. So I kind of want to go through this sort of line chart here of how we're tackling the problem that we have. So the first one is really learning what's normal from history. So this is taking a lot of your existing data. And from that learning, how you operate on a typical day-to-day basis. So there's a couple of things that I want to talk about as we go through and tackle this problem of learning what's normal. So the first one is normal is not necessarily what is optimized. There are 2 very different kind of paradigms or perspectives when looking at how an environment operates. So we really want to get down to what's normal, so that we can start to detect deviations and trends away from that normal. There's other tools and solutions that look at how to optimize environments, but we don't want to look at and try to detect anomalies off what would be an optimized environment. We really want to do it off what your environment looks like today. And if we go through some of the efforts to go and optimize how our system behaves by moving workloads around. All that means is that we've kind of changed what our normal is now. We just need to go back and re-baseline what's normal. So there's a very different perspective on normal versus what is optimized. And we're really looking at -- trying to understand what's normal for your environment. The second thing that I want to cover is kind of data seasonality. And there's a lot of different ways that we can go and determine how granular we want to be and building a model of what's normal for your environment. Kind of what we've landed on is days of the week versus weekends because each -- the more granular you get, the more data that you have to go. So your workload at 9:00 a.m. on a Monday morning is going to be very different than 3:00 a.m. on a Sunday morning. So very different things are happening within the system. If we try to take into some of the seasonality things and taking a step back and doing things like maybe payroll happening every 2 weeks or if it happens twice a month, maybe under the month reporting or billing, same thing at the end of the quarter. And different industries have different seasonalities as well. A lot of the financial customers here in the U.S. have tax season between February and April, right? So they have slightly different workloads and things that are going on during those periods of time where over the holiday season in November, December and even into early January, retail and a lot of our shipping customers have very different workloads during those periods of time. But if we tried to go and build a model to take into account all of those different seasonality things I talked about plus different holidays, we would really need a couple of decades' worth of historical data to go and build that normal baseline. And that's something that I know none of our customers would likely have; the one, to have kept the same hardware and same infrastructure in place for that period of time; but two, to have made no changes to their underlying infrastructure. So we've really tried to land, again, like I said, on those days of the week versus weekend to be really a good representation of what your environment is. And really what we're doing here is a lot of statistical analysis to really kind of understand what the bounds are that are on that environment. And we'll see this as I go through a couple of examples of some of the KPIs that we've done. So once we have that normal or that model of your environment or kind of that baseline, what we look to do then is look at real-time data and how that compares to that baseline. And what we're looking for is its operational anomalies. And anomalies don't always mean that there's a problem. It just mean that there's something that is different. So there might be an easy way that we can go back and explain away why we had this operational anomaly, maybe we had to go and start a job that we wouldn't normally run on this period, but it kind of caused some spikes in some of our maybe database connections or something that's -- hey, we knew this ran for 30 minutes, but it's gone and everything kind of goes back down into that normal phase. That's just a different behavior. It's not necessarily an operational anomaly. Kind of what we're looking for is those little trends that start to deviate from normal or even some of the large kind of spikes that might come in to play as well and treat those as real anomalous behavior. Kind of as we go down the journey that we have, the next phase is really again going back to thinking about what we talked about with Problem Insights is how we deliver guidance in response to those operational anomalies. So we want to say, when we start seeing certain KPIs go anomalous, what do we do to go and respond to when those things happen? And then ultimately, with the ultimate goal of using well-proven automation to respond to operational anomalies, really allowing the system to manage its normal health. So getting up to the autonomous operations part of their journey. And I will say there's an awful lot of work that goes into really each one of these different phases. There's a good bit of data science and really deep understanding of the subsystems that we have. So we've chosen to start with Db2 and CICS in our most recent release of the solution. We are working on other subsystems in the future, but what we've really done is taken those couple of subsystems and broken them down into a set of KPIs. There's roughly 80 or 90 or so for Db2, 60 to 70 for CICS. But really trying to understand what are the key things that subject matter experts of those systems really want to look at and really kind of understand if they're operating within normal bounds or not. So next, let's kind of go through and map kind of the paradigm that we saw on the last slide to really the underlying technology that's going to go and drive this. So our machine learning data and analytics flow. And really -- this is really for the purpose of training, scoring and visualization and anomaly detection. So again, if you think back to what we talked about with data streaming and the solution we have there, we used our common data provider again to load historical data in batch into our machine learning Db2 database and really to build a model, like we talked about, of having the days of week versus weekend. We really need a month or 2 months' worth of operational data to kind of go and build that model. And what we want to do is really look at the normal days. So if there's days where we had operational anomalies, where maybe we had an issue that happened within the workload for a couple of hours, we just want to take that data out of the data sets that we're using to train and build the model. So once we have that data loaded into the database, we'll let our Watson Machine Learning for z/OS engine go and create a model that's representative of your specific environment. So once we have that model, we can go and move into kind of that next phase. And this is where we use, again, our common data provider to stream data in real time into our ML database. So initially looking at, like I mentioned, Db2 and CICS. We take that real-time data, and then the Watson Machine Learning for z/OS engine will go and score that data against the model that was built for your environment. And what it's going to generate is an anomaly score. And don't try to strain too much to read the little graph there. We'll kind of go through a couple of examples in more detail. But once we have those anomaly scores, we really want to look for deviations from normal, where are we going and not operating within the normal balance that we've seen over those normal periods. And what do we do with that data? Are we trending to go up? Are we looking at having a broader problem that we need to consider? Let's go and take a look at what this looks like from a code and technology standpoint now through visualization. So this is a view of our Problem Insights dashboard, and really how we can go and start to visualize some of those scorecards that we have for our subsystems. So we can see we have our different Sysplexes listed here, our different subsystems that we support today, Db2 and CICS. So let's jump into one of our Db2 scorecards. So I mentioned that Db2 was broken down into roughly 80 to 90 or so different KPIs. And what we've done is we've grouped those KPIs into different groups, things like data sets or locking or logs, really try to combine those into usable or readable groups that we have. We can also see that this scorecard is coming from a relatively normal day. Average there is looking at in the light blue in the normal range. So let's drill into one of the groups, and we can go and take a look at a couple of the different KPIs. If we expand the DDF group, we can see the list of KPIs that are under this group. So we have things like ACTIVE_DBATS, DBATS_NOT_USED, various different KPIs, and all these KPIs get rolled up in aggregate up to that top level DDF group. So let's jump into one of the specific KPIs and see what that looks like. If we take a look at our ACTIVE_DBATS, this is one of the KPIs that we've had. This is actually looking at a little bit of historical data from a couple of months ago. But we can see the black line that's on the graph here. This is a real-time scoring line. So this is the real-time data that's getting scored for the number of ACTIVE_DBATS. And we can see we had a flurry of activity that happens between 7:00 and 9:00 a.m. But if we take a look back and look at the varying gradients of the different colors that are there, we can see that we have a range that's in light blue. And that's really what's normal for our environment for this KPI. So we can see that flurry of activity that happens is all well within the normal bounds for this period of time. There's the middle shade of blue that represents 3 and 6 standard deviations away from normal. And then the dark blue section that represents more than 6 standard deviations from normal. So on this day, for this KPI, we can see that everything was very healthy and normal. So let's go back and take a look at one other KPI. So if we take a look at the TCB KPI or the TCB Time KPI on another relatively normal day, we can see it looks very different, a very different pattern than the KPI we just looked at. But again, we can see this nicely goes through the normal range of this environment. So let's go take a look at what happens when we have some not-normal activity. So from a scorecard perspective, we can see something very different when we had a period of activity that was anomalous. We can see at the top-level aggregates of the KPIs into those groups. We have some very dark blue sections there with very high anomaly scores. So this points to a period in time where things were operating outside of normal bounds. So let's go take a look at one of those KPIs that was anomalous. So taking a look at our TCB Time KPI, we can see that this one went way out of bounds what was normal for this period of time. This is the same KPI that we reviewed a few slides ago that were part of a normal day, but we can see that period of anomalous activity between 12:00 a.m. and 3:00 a.m. So we can start to see this KPI deviate from normal into those sections. And if we can get advanced warning when something could be going on and maybe an application that's dependent on Db2 or one of our key business applications, if we can get that advanced warning as we start to deviate anomalous, that will really help potentially save impacts to our end customers. So let's go and take a look at one of our customers that have used some of our solutions to really gain value and insight. So the customer I really wanted to highlight is APIS IT. So they really worked with our solutions, really looking at those first 3 capabilities of data streaming and looking at how we integrate and provide insights into operational data in their case, on their Splunk environment. But they really had a lot of complexity around keeping up with their hybrid-cloud workloads and environments. They had IBM Z sitting kind of in a silo, and they really had a tough time bringing everything together and really getting that hybrid picture of everything that was there. So they're really keen to IBM to really start pushing down on their AIOps transformation. So they were kind of really focused on those -- first of those 3 capabilities, so -- and working with Splunk as their analytics platform of choice to do this. So we can see a couple of the great quotes that are there from the system engineer at Apis. So -- that IZOA -- Z Operations Analytics is a tool that really simplifies the mainframe, right? So they can start to get more sense out of that data and get more value and do this in real time, so they don't have to go and dig and search. So really doing all of this analytics in real time is something that really helped this customer. And you can go and read more details of the case study there. It's at Ibm.biz/IZOAApisCaseStudy. All right. So to take a little break from the multitasking that I know a lot of you are caught up in just as I do myself. We're going to throw in a quick polling question, just to understand which of the capabilities presented will help drive your AIOps journey. Is it making data available to your analytics platforms through streaming? Is it helping with your hybrid-cloud problem identification in your Splunk environment or maybe your hybrid-cloud problem identification in your Elastic Stack environment or really focusing on what the last topic I talked about was being alerted to anomaly in your IBM Z subsystems? We'll give you just seconds to go and respond to the questions, and we'll make the results a little bit later available in the presentation. And with that, I'm going to go ahead and turn it back over to Jeff to talk about the next 3.

Jeff Summers;Program Director of Development

executive
#3

Thank you, Dan. Now I'll spend some time on the capabilities for performance, capacity and cost management, and starting with performance analysis. Taking a look at today's typical long-term performance and capacity planning, the approach is frequently based on batch-loaded historical data, at least a day old and sometimes more and often encompassing manual roll-ups, aggregation and analysis. Processes are typically homegrown, and tribal knowledge is far too often a key component of solution. People who are trying to make decisions are often using old and frequently inaccurate data. The IBM Z performance and capacity analytics solution captures data from a broad collection of data sources, including disparate systems and applications. Here, IZPCA excels in its ability to capture data from distributed systems, IBM I and mainframe OS, network applications and subsystems, such as CICS, IMS, Db2, MQ and more. This slide shows the architecture provided by IBM Z performance and capacity analytics. The structured data discussed on the prior slide is loaded via automatic data transfer in near real-time. For example, at the end of each SMF interval and then collated, analyzed and stored into the database. This means that the data is available for reports much faster than before. While the 3270 reports are available, the strategic platform reporting -- for reporting is IBM Cognos Analytics. This provides a rich platform for technical reporting. You'll also see that data can be streamed off-platform to other reporting tools, such as Splunk or Elastic Stack. The product provides a set of predefined dashboards on these platforms. Data can be streamed directly from IBM Z performance and capacity analytics or by using IBM Z common data provider, as Dan described earlier. The other new component you see is the Forecaster. This is a process that takes the existing data and analyzes it to create a forecasting model for future performance. This is a powerful component to help with predictions associated with capacity planning. Processing SMF data for analysis may seem to be expensive at times. And therefore, there have been many updates to the architecture with the automatic data transfer, the near real-time SMF collector and the Forecaster all now zIIP-enabled to lower overhead and costs. Let's look at an example of the Cognos Analytics dashboard. Here, we have a new summary workspace showing all of the key performance metrics on a particular system. Key performance metrics are out-of-the-box reports and analysis to focus on the major metrics that we believe performance analysts would be interested in, covering z/OS, hardware, storage, CICS, IMS and other domains. These have been designed for subject matter experts and the domain experts within each of these areas in order to focus on the right information. Within this report, you can then drill down and start to look into more details with a few simple clicks in order to find root cause problems or understand performance issues. Another example are reports in Cognos Analytics that define heat maps to look at service level agreement compliance. Often a performance analyst is looking at tracking SLA performance. They don't have a lot of skill in that area. And ultimately, the challenge is to monitor those SLAs to make sure thresholds are being met at all times. If there is poor SLA performance, that can lead to loss of business and reputation damage. These new predefined reports provide the essential information needed to isolate issues quickly, so you can see where you're missing particular goals either in an application or a transaction level. You can then drill down once the exception has been identified in a timely manner and with less manual effort. In this example, we're looking at zIIP usage and CPU reports as a basis for optimization. Businesses often want to ensure they can lower their costs and make effective use of their zIIP investments. Using these reports, we can see where -- when and where zIIP-eligible workloads are running. This provides a benefit by being able to look at the impact on the overall workload performance and make a business decision as whether to invest in new zIIP processors or optimize workloads elsewhere. An additional benefit of IBM Z performance and capacity analytics is that the product is continuously evaluating data for potential threshold violations and then creating and storing exceptions. These exceptions can be analyzed over time to identify trends for potential performance issues, including the ability to drill down to exception details. The next capability I'll focus on is capacity forecasting. The key value here is being able to forecast when capacity may be exceeded, using multiple forecasting algorithms and leveraging key databases of information, such as the large system performance reference table. This enables the IT organization to take actions to either prevent the capacity from being exceeded by, for instance, moving or reducing workloads or prepare and advance by acquiring new hardware, memory, storage, et cetera, in a timely manner to prevent disruption in service. Either action may be appropriate depending on the circumstances, but the key is that the IT organization is in control and can plan for subsequent actions. An example is the new what-if analysis delivered in the latest release. Capacity planners often need to model out how they can change environments or upgrade hardware or storage. We have the ability to show and simulate the potential differences of moving the workload from an old machine to a newer machine, for example, moving from z13 to z15, so we can understand the impact of upgrading processors on the overall performance. On the left-hand side, we can see the current analysis of the CEC and LPAR information. And on the right-hand side, what we believe the performance and consumption will be if the workload is moved to the new processor. This is very simple and easy to understand on a single workspace. Similarly, we may want to be able to track MIPS usage over time to make sure that if there are changes in behavior, we can identify the problem sooner and understand which workloads are contributing to this change in behavior. The Smart Path usage profiling looks at the hourly usage of the processor over time and builds a model to show what the average use is across every hour for every day of the week. You can then look at what the actual peak MIPS usage is compared to the expected levels and have an upper and lower boundary in order to understand where the current performance is sitting. And if it appears that a particular MIPS usage forecast will breach a capacity threshold, an exception can be arranged. And the last of the 6 capabilities is cost management. If you're moving to tailored fit pricing or are considering it, you know that this changes a lot of the ways of measuring and controlling workloads within your environment. In the past, under a rolling 4-hour average, you may do soft-capping in order to keep performance to a certain level and also control costs. This does not really apply with tailor fit pricing. So therefore, there are several stages to understand how you might manage your environment to make sure your expectations are met, performance levels are maintained, and there are no unexpected costs. First, there will be a step around planning, which will be understanding your current workloads and applications, including which products are running on each LPAR. And then while they're running, you'll have ongoing analysis of the current MSU consumption levels per container and forecast future consumption. And then once you understand what your current workloads are, you can start to look into how you may optimize to drive efficiency and performance improvements, identifying top consumers of MSUs that might need to be either taken off line, updated or changed in profile in some manner. Let's look at some examples of how IBM Z performance and capacity analytics can help you. In this example, we're showing a dashboard indicating the total MSU percentage utilization of enterprise containers, ENTER TR1 and ENTER PR2. We were able to show the total number of days into the contract and how much of the MSU baseline has been consumed at this point. Then we can drill down on one of the containers to look at a more detailed level. For example, the cumulative MSU usage for containers per month from the start of the tailored fit software pricing agreement and up until the current date. This gives a clear view of how far through the year you are and how much of your baseline application you have used? From there, we can now start to drill down into more granularity to consumption at the LPAR level and then to the jobs that are running. If there was a spike, we can isolate that within a few clicks and then go down to understand what the workload might be doing. Here, we are showing LPAR MSU by the container on a monthly basis. Each color part on this deck bar indicates a different LPAR. And then from here, we're looking at the LPAR product MSU usage by hour. So you can see which applications are using the most amount of MSUs. This level of detail helps to truly identify our cost as every MSU is of equal value not affected by peak periods. We need to have a more transparent process that can help with charge back and can be implemented depending on the needs of the business. IBM Z performance and capacity analytics was able to augment the data or annotate business descriptors to help group by application or other associations. It also performs aggregation as part of its curation process. This means it is efficient in getting hourly, daily, weekly or monthly roll-ups of data. When we come to looking at longer-term trends, this is vital. Another key part when looking at tailor fit pricing is understanding where we expect to meet or exceed our MSU baseline for the year. Using the forecasting capabilities, we're able to make an accurate prediction of that date. It is for the business to decide if that aligns with their goals. Maybe we have increased workloads which means increased business, and therefore, the additional MSU consumption is of value. If you want to be able to reduce the number of MSUs and bring it more in line with targeted baseline, then you are given time in order to do this because you have the visibility in advance. And with that, back to the cost. So for example, if we're being asked by the business to support in retail a new sales event, can we estimate the impact on workloads. Using IBM Z performance and capacity analytics, we are able to see which applications are affected, what the workload growth is and start to make informed decisions about how this would work and tie it back to capacity planning, so that we know we can support the changes and not be resistant to change within the business. IBM Z performance and capacity analytics sits at the heart of this to help identify all of these parts. By looking at what's running in each container, you can identify these top consumers and then from there make decisions around your organization strategy. For example, reducing application MSU consumption, eliminating redundant tasks and workloads and identifying additional optimizations. Consider an example where you may have an application running outside of the peak period, but has not been updated for some time, perhaps a best job of COBOL application. Maybe by recompiling into the latest level of COBOL, the amount of consumption by the application can be reduced and thereby freeing up MSUs for more critical workloads. Here is an example of a customer who's had success with the product, a large U.S.-based IT service provider who has multiple clients of their own, and they were using multiple tools across these accounts in order to manage their environment, understand performance and make capacity planning decisions. What they were looking for was a single reporting tool for performance analysis and capacity planning. And using IBM Z performance and capacity analytics, they had near real-time access to these metrics and allow them to make strategic decisions about the capacity in their environment across all of their customers. This also resulted in consolidation down to a single tool to make this easier and allow them to increase flexibility and analysis across their business. Now let's take a minute for a polling question regarding the performance analysis, capacity forecasting and cost management capabilities. As I mention each of the responses, please take a moment to select the ones that apply. So which of the capabilities presented here will benefit your performance and capacity management: Shifting performance analysis from batched latent data to automated near real-time data; consolidated key performance metric views with drill down to system and resource details; using analytics to forecast capacity impacts via what-if analysis of workload increments; a granular view of the workloads that are directly impacting MIPS consumption costs. Take just a second and complete the responses. As we pull all these capabilities together, you can see that there are multiple stakeholders that are leveraging, are affected by and making critical decisions based on Z operational and performance data to support the business. And while we believe IBM Z Operations Insight Suite comprised of IBM Z Operations Analytics and IBM Z performance and capacity analytics provides a complete solution for IT operations, performance and capacity management. Here are some links to learning more about the capabilities included in IBM Z Operations Insight Suite. The APIS case study that Dan mentioned in his presentation, a white paper describing IZ performance and capacity analytics and tailored fit pricing as well as a link to our IBM Z software newsletter for operations and management addition. And with that, Dan and I greatly appreciate your time and attending the session, and we have included our contact information in case you would like to reach out to us. At this time, we'd like to open up for questions.

Daniel Wiegand;Senior Offering Manager

executive
#4

All right. So thanks, everyone, again, for attending the session today. This is Dan, again. I just wanted to kind of review some of the polling results that we had from the first polling question. So kind of overwhelmingly, the responses that we got that 63% thought that data streaming and making data available to analytics platforms is very important. But that actually was the number 2 question that we had with the work we're doing with anomaly detection and integrating machine learning technology into the solution is really the biggest key, and that was 75% of the respondents thought that, that was important. That's really where we're continuing to make a lot of investment. So as we outlined kind of in our statement of direction, we went and built this, we're looking at doing other subsystems coming out next. So keep an eye on this space from us. And one of the things that I always invite customers to do is a lot of what we build around the platform, we always invite customers to participate with us in the development labs to help shape and build the solutions. And we couldn't have gotten to where we are today without some of the great sponsor customers that we have and the relationships. And a lot of those customers were able to send their data to really help refine and build a lot of what we have. So if you're definitely interested in this space, please feel free to reach out to me and -- or your IBM account rep, and we'll be happy to start to engage and work with the labs. And with this, I'll let Jeff go ahead and talk about some of the results that he's seen as well.

Jeff Summers;Program Director of Development

executive
#5

Okay. Thanks, Dan. Yes. So just getting the final results in on the second polling question. So we had a high response, 50% to 58% come back on the first 3. So there was a lot of interest, in particular, around those. And I think that, that really reflects, first of all, the need to have access to more of the near real-time data as opposed to working on capacity planning perspective from latent data or yesterday's data kind of a view and being able to forecast out with the -- what could be happening in the future. And making it -- doing it in such a way -- the highest one was actually the second one, the KPMs, making the data available in the way that's easily understandable, the IT operations teams, or having to do more with sometimes less people or people that are sharing responsibilities across multiple areas. So making that as clear as possible. We're definitely the higher one. So very good feedback. I appreciate that. Okay. Do we have any questions have come in?

Unknown Executive

executive
#6

I'm just responding to a few.

Daniel Wiegand;Senior Offering Manager

executive
#7

We've had some great questions. And a couple of more around -- if you guys want more information, my contact information as well as Jeff's is on the slide that's on your screen. Please feel free to reach out and contact us directly. We're happy to provide additional information or through your IBM account team as well.

Unknown Executive

executive
#8

We've got a couple of more questions popping up on the screen here. Guys, do you want to handle those or do you want to handle those in the Q&A box and down line to do that? We can run a couple of minutes over, so just need to decide. Or otherwise, let the user know that we're going to get back to him with answers for unanswered questions.

Daniel Wiegand;Senior Offering Manager

executive
#9

Yes. It looks like Chris is going to go ahead and answer those questions. So I had one about the performance of our common data provider. And kind of how much CPU was actually used? And the answer is it's -- I don't have those specific numbers, but there really is a lot of variance in what the measurements and results are. And a lot of it is really dependent on the system characteristics, which is why in this slide and in the details, we tried to be as prescriptive as possible with what our environment configuration was when we went and ran those tests. And a lot of what we had done was really looking at a large amount of data. So one of the key points was the 300 gigabytes of data for the CICS and Db2, really trying to mimic what a customer production workload would look like. So something that we had tried to go through there. So performance for me is one of those things that you don't do once and check the box and say you're done. It's something that we continually try to push and find new ways to enhance how our customers are using this. And while we're answering a couple of these last questions, Tony, do you want to chat real quick about looking at the events for next week? I know we have some additional...

Unknown Executive

executive
#10

For next week?

Daniel Wiegand;Senior Offering Manager

executive
#11

Yes.

Unknown Executive

executive
#12

Thanks, Oliver. Yes. Tune in next Thursday. We have Jordan Cain and Joe Winchester and their topic is how Zowe bridges the mainframe and the cloud for new generation of developers and system programmers. So that is going to be an exciting topic. Zowe is a hot topic lately. So I'm glad to have those speakers on board for next Thursday. We do have a survey. You can evaluate today's session. When we are done here, it will pop up automatically. Please fill out your comments. So we'd love to hear feedback on how we could make these better. And lastly, we will have a replay if your colleagues want to hear, see the charts and hear the presentation one more time or if you want to go back and hear everything again, which, of course, makes a lot of sense, a lot of information today. Go ahead, and we will have a replay ready by tomorrow, I think. Oliver, is that right? Tomorrow, we'll have it?

Unknown Executive

executive
#13

That is correct, yes.

Unknown Executive

executive
#14

Yes. Perfect. All right. Back to you, Oliver.

For developers and AI pipelines

Programmatic access to International Business Machines Corporation earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.