International Business Machines Corporation (IBM) Earnings Call Transcript & Summary
August 7, 2020
Earnings Call Speaker Segments
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveLadies and gentlemen, good morning. Welcome to the IBM webinar Automate IT Operations with AIOps for Lower Costs and Greater Resiliency. In today's webinar, there are 2 sessions. AI-Ops Automate IT Operations for Lower Cost and Greater Resiliency presented by Matthias Funke, Executive Director, AIOps, Cloud, IBM Worldwide; and Potential Improvements on Operational Efficiency with AIOps to be presented by Francis Wong, Program Director, Hybrid Data Management Cloud Services, Data & AI, Cloud, IBM Canada. Throughout the webinar, if you have any questions, please feel free to tag them in the Ask a Question box that is shown on the screen, and we'll get back to the questions at the end of the webinar. Without further ado, I'd like to pass the turn to Matthias for the first presentation. Matthias, over to you?
Matthias Funke; Executive Director, AIOps, Cloud, IBM Worldwide
executiveHi. My name is Matthias Funke, I lead Product and Strategy at IBM for AIOps. Today, I'd like to introduce to you the AIOps topic and share with you why we are so excited about the opportunity that AI and automation bring to the IT operations domain. To understand AIOPs, let me take us back 50 years and take a look at the movie 2001, A Space Odyssey. It was a groundbreaking iconic movie created by Stanley Kubrick. And what made it so exciting, especially for me, was the character HAL, a first representation in the movie of Artificial General Intelligence, a concept or a character that was tasked to ensure that the mission is -- that the ship, that the crew was traveling with in space is up and running and remains healthy throughout the journey. HAL interacts with the crew through audio and video. HAL understands the sentiment and the feelings of the crew. HAL's job and HAL's goal is to keep that ship up and running. Unfortunately, not necessarily, and also to keep the crew alive, which at some point in the movie turns out unfavorably for one of the crew members. So HAL's role was that of a gatekeeper, but also kind of analyst in diagnosing issues as they relate to the infrastructure of the ship and to repair that or help inform the crew on what to do to repair those issues. Going back to present time, we think we have now approached a maturity of artificial intelligence that help us realize many of the characteristics that were depicted in the movie. As Gartner predicted, AI and automation will make a profound difference to the way people manage and also anticipate, predict and avoid incidents before they occur. Now why AIOps? Because IT operations continue to be a challenge for many organizations. If anything, these challenges have been growing over the last decade or so as systems become more complex, as applications moved to the cloud, become hybrid and cloud-native in nature. And organizations continue to struggle with the capacity associated with existing infrastructures and applications and to keep the lights on versus capacity that they dearly want and need to support and drive new initiatives for the business and respond to client for line of business requirements faster. The inherent complexity of some of these applications prevent users from applying traditional means to understand the health of the state of application. You cannot call rules against unknown unknowns. But AI can help you understand anomalies as they up -- residing within -- buried within unstructured data like application looks or can correlate different data points and anomalies because of fast variety of heterogeneous data points. On the user perspective of this domain, right. A lot of this domain requires activity that we call Toil, right that is competitive in nature, that is subject to a good candidate for AI automation and which is contributing today to -- this process being very labor intense. You have many people responsible for keeping an application up and running. And as applications become more complex, the skills required to diagnose issues as they occur across a complex application stack are scarce. Very few people in the organization have the skill to deal with those complex issues. And we believe that AI can make a profound difference here as well, lowering the skills required to diagnose complex problems, but also reducing the effort required to diagnose and resolve these issues. A lot of benefits have been realized with the transformation away from a traditional Silo delivery, Waterfall-ish delivery model towards an agile delivery model, leveraging DevOps best practices and methodologies. But with that parallelism of some tasks and activities, we also introduced new risks, right? As you move faster, as things are happening in parallel, there are risks that the left-hand doesn't know what the right hand is doing and that we are moving quickly -- too quickly sometimes, right? In rolling out an update that can lead to a significant client impacting or revenue impacting outage. So there is tremendous opportunity to foster the transformation towards an SRE-oriented and application-centric operating model. And there's also a lot of opportunity to miss out, right? And continue to see significant impact if you don't embark on this journey. And that can happen even to the most competent IT-savvy organizations, is not a secret. Many, many large outages, revenue impacting and client impacting outages happen every year, some of which are depicted here on this chart. But for those brave enough to embrace AIOps, they will have the opportunity to realize profound benefits. Double-clicking on the incident management or resolution use case, what's really new and what's really the opportunity for AI now is to bring unstructured data into the equation. The ability to understand anomalies in unstructured application locks or similarities between tickets makes a profound difference to the experience of the user in this process and the information that is available to the user to make a decision about what has happened and what action to take. Let me give you an example, and compare and contrast the experience of what many users have today versus what AI could bring to these users, right? So today, the ITOps engineer deals with many different tools. And if there is an incident or issue occurring in the environment, it probably starts somewhere in the form of an alert that the engineer gets notified through a tool and gets back to the user interface of the tool, tries to understand what happened at the timestamp for which the alert was triggered and then very likely has to jump over to a different tool, like a local analysis and aggregation tool to look again what happened around this time frame for the application. He's trying to figure out which elements of an application stake are affected, trying to connect the dots as different tools and different pieces of an application stack might emit events or alerts. But it's not clear upfront which of those relate -- and relate to the same incident. So there's a lot of tool hopping going on by the engineer. And in the example that you see on this chart on the bottom right, which is a real-life example for one of our application services we have. It took that DevOps team close to 5 hours to diagnose the incident and then take the right action to resolve it. When we simulated the effect of infusing AI into this workflow, we would have reduced this incident to 14 minutes to get it resolved. When you think about the incident management workflow with its 5 steps at the bottom of this chart, the opportunity is significant to reduce the time it takes to detect, isolate and diagnose problems. In general, that not only resolves or reduces the time to resolution down to near real time, but it also then accelerates and reduces the time it takes to fix and verify and ultimately, resolve or remediate an incident. With our newly introduced Watson AIOps offering, we are able to deliver the experience and the benefits that I just outlined. How do we do it? We have 3 key value points. Number one, we have best-of-breed artificial intelligence and machine learning models that help us understand unstructured data and what normal means when you think about application logs or what abnormal situations are. All of this is unsupervised learning and doesn't require an understanding or coding of rules against these unstructured data sets. And as such, helps us with the risk that comes from dark Web, right? Those unknown unknowns, the fact that there is inherent complexity that we cannot really anticipate. The AI will help us identify those incidents or those issues and make them subject and input to an analysis process. And that analysis process is also powered by AI in a way that we use real-time technology and streaming technologies to correlate data in real-time across various unstructured and structured data sources, to better segregate signal from noise, better understand what requires attention of an engineer. And then leveraging AI to extract entities and valuable information and bring that together in the form of a story to the engineer in ChatOps. So think of AI as your companion that proactively taps you on the shoulder and explains to you, here is what just happened, here is why it's happening, what is impacted, what's the blast radius? And based on the symptoms that were extracted, what do I recommend as a next best action. All of this empowers the engineers to get their job done more effectively. Now let's take a look on how this experience looks like in the ChatOps tool I select. [Presentation]
Matthias Funke; Executive Director, AIOps, Cloud, IBM Worldwide
executiveHope you saw Watson AIOps in action? Let's take a brief look at how Watson AIOps is positioned to deliver this experience? What you can see on this slide on the left-hand side, all the different structured or unstructured data channels that are subject to input towards an AIOps. Many of these channels already are represented by tools that aggregate information for an environment or group events together, right? So Watson AIOps can tap and integrate with these existing tools through a Kafka integration. As these tools emit data, we will -- Watson AIOps will pick them up in real-time, in the streaming fashion via Kafka and correlate and infer those data points to assess and understand what requires attention. It then formulates the historic service or augment an existing story as you saw in the demo, and surfaces that in ChatOps. It can also expose those insights in a machine consumable form, right? That allows you to feed that insight back into a process or an application of your choice that your engineers are used to. And we already have clients that see the value of this offering and the application of AI, one is CaixaBank in Spain; and one is Dynata, a U.S.-based marketing company. Both companies see the benefits of Watson AIOps and its ability to discover anomalies from unstructured data and also discover similarities between historical tickets that share symptoms of the incident that is currently under evaluation. All of this helps the teams of these companies to more quickly identify the root cause of a certain incident and what needs to happen to fix those issues. We are also using Watson AIOps internally. One of our most prominent services on IBM Cloud is Db2 on Cloud and Db2 Warehouse on Cloud. These DevOps teams have tremendous challenges with the vast amount of data on the management and the many clients they serve. So when they got their hands on Watson AIOps, they could see the benefit of Watson AIOps bringing in additional day points from a certain element of the solution architecture that were previously not able to be dealt with, right? Because of their unstructured nature. Now it helps them to get better insights on what's going on and what action to take. So why should you be excited about the opportunity to infuse incident management with AI? And why should you be interested to try out Watson AIOps and get a deeper experience with it. It's very simple. The benefits that we can deliver with Watson AIOps are profound both from a cost reduction perspective, but also from the -- with the idea that you can shift capacity over to tackle new initiatives more effectively, more quickly. And not just from a management perspective, but also from a user perspective, the ability to get to value quickly with the unsupervised AI and machine learning approach that Watson AIOps brings, gets you the value, quickly delivers an ROI more quickly, but also delivers a profoundly different user experience for your engineers that helps them manage their work-life balance more effectively, retain a high morale, as they go through the incident management workflow and can reduce the number of iterations and investigations required to diagnose an incident. All of that helps the engineer, helps with morale, helps retain highly qualified people, gets you -- or it's the CIO or the management team into the driver seat and become more proactive for your business. So it's a great story and I'm looking forward to hear back from many of you who want to engage with us to take on the next step. Thank you very much.
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveComing up next, we'll have Francis Wong to present the topic, Potential Improvements on Operational Efficiency with AIOps. Over to you, Francis.
Francis Wong; Program Director, Hybrid Data Management Cloud Services, Data & AI, Cloud, IBM Canada
executiveHello. My name is Francis Wong. I lead the DevOps organization for IBM's Db2 on Cloud and Db2 Warehouse on Cloud products. Db2 on Cloud is a fully managed cloud database for transaction workloads. And Db2 Warehouse on Cloud is a fully managed cloud database for warehouse workloads. Both products are built on IBM's flagship relational database engine, Db2. Between these 2 products, we have over 30,000 customers worldwide. Our customers range from different industries, including banking, retail, government and health. And the size of the customers range from international enterprise customers all the way down to small, medium-sized local businesses. And as you can imagine, a database really is the bedrock for any application. It absolutely cannot be done at any one time. A database may be lacking a future too, but as long as it's fully operational, that's what matters. So any advantage I can find to keep the uptime of my databases, I'll be looking for that. Let me start by talking a little bit about my team. My team has 80 people worldwide, and my team takes care of both the operation and development aspects of the business. Right now, about 50% of my team's time is actually spent on the operational aspect of the business, which includes running the systems, keeping them up and running 24/7, 365 and also to handle any customer situations, customer questions, any outages of that sort. Any chance I get to reduce the operational time. That time saved can be better spent at improving the product. And in an industry as competitive as mine, where we've got competitors from all the major cloud vendors, any advantage I can get to improve my product will give me a leg up on my business. Back in March of this year, Matthias approached my team to see if we're interested in doing a beta program for AIOps. And the answer was simple. It was -- of course, it was yes. In fact, my team has been trying to leverage AI and ML to improve the operational efficiencies of the team. However, we found that it was actually harder to do because of a couple of things. One, my team, we're experts in databases and also running the cloud business. So while IBM is a leader in AI, my team, we just certainly don't have the expertise for it. So having just a package that we can just leverage to incorporate AI into our business, that's huge. The premise of the beta was simple. We wanted to see whether the AIOps engine actually can detect a failure before it occurs. We wanted to keep the beta small, just to prove out that one point. What we did was we sent the past logs for one particular system, which we know very well, the history of instances. We sent that over to the AIOps team. And the AIOps team would feed the logs into the engine and see whether it could detect some of the incidences. And the results were quite impressive. The engine was actually able to predict 1 particular file system failure, 47 minutes before the incident could have occurred or would have occurred. So 40 minutes -- 47 minutes. That's huge. That may have given my team enough time to actually fix the system before the outage occurred. Or at the very least, my team could have informed the customer, warned them such that the -- my customer could have put a message on the website or my customer could have -- could have run their DR operations, DR process to fill over to a DR site. But unfortunately, for that particular incident, it took 3 hours for my team to recover from. If we had AIOps, in theory, we could have reduced the outage time by at least half. Because AIOps would have given us insight into exactly which part of the system was failing and a guidance into what might the recovery procedure to be? So given the great result from the beta program, we're now moving to the pilot program with AIOps. The difference between the pilot and beta program is simple. One, instead of sending in past backlogs, we will now be sending logs to AIOps in real-time. And two, instead of one system, we'll now be expanding the program to a subset of my production systems. What I want to see from the pilot program are 2 things. One, I want to see AIOps actually detect and prevent outages before they occur. That would be huge. And 2, in the cases where AIOps cannot prevent outages, I want to leverage AIOps insight into the issues, one by helping me pinpoint where the issues are from its log analysis. And two, I want AIOps to be able to tell me whether similar instances have occurred in the past and what the solutions might be. Now if the pilot program is successful, my goal is to be able to roll AIOps out to all of my production systems. Ultimately, what I want to leverage in my AIOps is this: to increase the uptime for my products, to decrease my operators' time running the systems and resolving customers' issues so that they can spend their time just building a better product with better uptime, better product and a happier team that should ultimately lead to happier customers.
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveHello. Good morning, everyone. Thanks again for calling in for today's webinar. I'm Aramis from IBM Marketing. So I guess a lot of you might have some questions in mind. [Operator Instructions] So let me check out what questions we have in front of us now. So the first question is, does the solution monitor and detect issues in real-time, can you elaborate it? So maybe the question over to you, Matthias?
Matthias Funke; Executive Director, AIOps, Cloud, IBM Worldwide
executiveYes. Thank you. And can you hear me? Aramis? Okay. Good. Yes. So great question. The answer -- the short answer is yes. The longer answer is that, yes, this is a real-time analytic streaming solution or all the data points in the form of unstructured log entries or structured events or alerts that we ingest through a Kafka queue are, let's say, triaged, inferred in real-time. There's no data lag that we have to then land the data in and then apply machine learning models to that. It's all happening in real-time.
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveOkay, that's clear. So the other question in front of me is, does the solution have the ability to analyze historical data? And how is it done?
Matthias Funke; Executive Director, AIOps, Cloud, IBM Worldwide
executiveYes. So it's -- I guess it's the flip side of the question that was just asked. So for the -- we see -- so we are not analyzing the historical data in the form of logs. We believe that, that data is collected and aggregated in a log management tool that you have already in place. The events would be aggregated and grouped also in different parts of your monitoring and management solution. This is really meant to focus on that real-time aspect. So think of Watson AIOps primarily as a machine learning and AI capability that, in real time, looks at the data and correlates it to the point where it decides, this is something that requires attention and then brings that into ChatOps. However, there is one element of historical analysis when you think about the ticketing information that you saw in the demo, right? So if -- once we identify the symptoms that could represent a certain incident at present time, while we can help you look into your ticketing system like a ServiceNow or a remedy or something else, and Watson will perform an intelligent search and will use NLP and NLU capabilities to intelligently extract from unstructured descriptions of historical tickets whether there is precedent. So if there are other tickets that share the same symptoms, we will bring them back to the SRE or the user and we will allow them to determine whether there was precedence and there was already action taken that they could just repeat or follow in the form of advice. So it helps them take the right decisions on how to resolve the problem.
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveOkay. I see. The other questions I can see is if my organization were to deploy such a solution, the -- meaning the Watson AI solution, how soon can we realize value from it?
Matthias Funke; Executive Director, AIOps, Cloud, IBM Worldwide
executiveGreat question because we believe that one of our key differentiators with this solution is the time to value because you are not -- you are not required to define rules or code rules against certain anomalies, and train the AI in a supervised fashion in terms of what normal means or what abnormal means. All the algorithms that you saw in action and that we -- that I talked about during my overview, are really unsupervised machine learning models that we trained against historical data that we need initially. And then once the system is up and running, any data that comes in going forward will be used to retrain and retrain the models on AI and ensure that these models, the quality of these models and the accuracy of the model retained is kept at a high level.
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveOkay. I see. So there's another question -- oh, this is from an IBM iSeries customer. Can this solution be installed in IBM iSeries?
Matthias Funke; Executive Director, AIOps, Cloud, IBM Worldwide
executiveYes. The solution is really agnostic about the tool chain that we indicate with or the applications that we want to monitor and manage. It's totally industry agnostic. It's agnostic to the platform on which your application runs so whether it's iSeries, or z or whether it's a distributed platform or it's a cloud-native fabrication running on any major cloud service provider. There's no -- there's no difference in the way the AI would think about the data coming from these systems.
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveI see. I see. So the other question is, will this solution -- will this solution my IT staff with a lot of different alerts?
Matthias Funke; Executive Director, AIOps, Cloud, IBM Worldwide
executiveI love the question, right? Because we believe one of our key benefits with Watson AIOps is the ability to further segregate signal from noise. So what you saw in the demo, for instance, right? The story that you saw there is really a story that might be triggered initially by an anomaly that we identify from application logs. But if there are alerts or other events that are triggered also by other elements of your application stack and other tools that are listening into these stacks, then we will associate them to the original incident. So we actually reduced the amount of noise, and we help you focus on that specific incident. And we orient all these data points around a specific incident, and that makes it easier for the SRE or the ITOps engineer to determine what is really causing it and what action to take.
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveGot it. So far, the final questions I see in the chat box is does -- does Watson AIOps augment or support APM? And can you elaborate?
Matthias Funke; Executive Director, AIOps, Cloud, IBM Worldwide
executiveYes, APM. I guess the -- the person asking the question is talking about application performance management. So the answer is yes, right? Because often what we see also with many other clients that we worked with on this is that many of these problems are performance-related. So they see a slowdown in the application performance that is not explained or is not applicable. And they are looking for answers, right? And we have actually 1 of the 2 reference or case studies that you saw in my presentation, actually dealt with one of these performance-related issues, where the client had a big degradation of performance and an automatic recovery for one of their critical applications. And over a long period of time, they could not explain what caused this deterioration? So Watson AIOps helped to determine what actually happens in the application, and they have now an idea on how to solve this.
Aramis Wong;Marketing Manager, IBM Cloud & IBM Services
executiveOkay. So thank you, Matthias. Okay. So so far, I think all the questions that we have in the questions box have been answered. So -- but maybe our customers do have some questions in mind, but don't worry about it because our representative will be reaching out to you via the phone or via e-mail in the next couple of days. So if you do have any questions or you do want to know more about this solution or any other type of IBM solution, please feel free to reach out to us. Okay. Thanks again for tuning today. So we see you again in our next webinar or live event. Bye-bye.
For developers and AI pipelines
Programmatic access to International Business Machines Corporation earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.