May 5, 2020 By Jessica Rockwood 4 min read

When a component in the application end-to-end workflow becomes unavailable causing impact to internal users or external clients, the clock starts ticking, and customer satisfaction can be significantly impacted. Market research firm Aberdeen pegs an outage at about $260,000/hour. And many businesses are ill-equipped to resolve such outages promptly.

But there’s some good news for frustrated CIOs. Market intelligence firm IDC predicts that, by 2024, enterprises that are powered by AI will be able to respond to customers, competitors, regulators, and partners 50% faster than those that are not using AI.[1]

Dispersed Data, Imminent Problems

The central problem many IT departments face is that the vast volumes of data from various sources being processed constantly by the modern enterprise can’t be monitored in real time using traditional data analysis techniques or applications. It can take hours or even days to troubleshoot the root cause of these issues when they occur.

But there is a bright light on the horizon for CIOs. Over the last decade, the IT industry has seen the rise of a new set of frameworks for IT operations. DevOps and DataOps revolutionized the way IT departments integrate with the rest of the enterprise. Now a new industry methodology called AIOps further extends the ability of the IT department to respond to change and address issues in real time.

What Can AI Do?

AI is quickly becoming an essential component of today’s IT departments because it can be used to automate how enterprises detect, identify and respond to potentially costly or catastrophic IT anomalies during an event or even before they occur. AI solutions can address the vast volumes of data, structured and unstructured, that traditional system monitoring tools were not designed to oversee with a singular view.

AI can collect data from a heterogeneous array of sources across the IT infrastructure, from performance alerts to incident tickets. This data can be used, for example, to enable cost reductions and help achieve improved productivity by recognizing a specific time of day when demand on IT resources is low, and shifting compute resources automatically. If automatic adjustments are not desired, data can be displayed in a visual format that provides IT operations managers or Site Reliability Engineers (SREs) with recommended courses of action, and explains the rationale behind those recommendations. AI can automate tasks like shifting traffic from one router to another, freeing up space on a drive, or restarting an application. AI systems can also be trained to self-correct so IT managers and their teams can spend their time on higher value work, while simultaneously getting full visibility into the enterprise’s operations.

Introducing IBM Watson AIOps

We’re excited to announce Watson AIOps, a new product that leverages machine learning, natural language understanding, explainable AI and other technologies to automate IT operations. Drawing from advances made at IBM Research, Watson AIOps gives businesses the ability to address and shape future outcomes, transitioning from a reactive posture toward proactive strategies. They are designed to introduce cost and personnel efficiencies, improve resilience across the enterprise’s information architecture, and speed issue resolution.

Watson AIOps is trained to connect the dots across data sources and common IT industry tools in real time, helping to quickly detect and identify issues. It extends beyond traditional structured sources of operational data, like metrics and alerts, to semi- and unstructured data like logs, tickets, and combines them using machine learning and natural language understanding to create a synthesized holistic problem report to identify and address the situation.

How It Works

Watson AIOps groups diverse sets of log anomalies and alerts based on spatial and temporal reasoning as well as similarity to past situations. Then, it provides a pointer to where the problem is occurring and identifies other services that might be affected, or commonly known as a blast radius. It does this by showing details of the problem based on data from existing tools in the environment, all in the context of the application topology, distilling multiple signals into a succinct report.

Watson AIOps leverages IBM’s leading natural language processing (NLP) technology to understand the content in tickets to identify and extract resolution actions automatically. As a new issue is identified, Watson AIOps will identify similar past incidents and provide the recommended next best actions to address the current issue to restore service. With the insight from Watson AIOps, predictive and proactive capabilities can be leveraged to drive more automation, shifting operations teams to higher value work.

IBM AI innovations are at the forefront of developing trusted and explainable technologies to help SREs interpret the reason behind a Watson AIOps recommendation. Maintaining transparency and explainability is critical to building trust in an AI system’s actions, and IBM continues to develop AI solutions that inspire confidence, including Watson OpenScale.

The Technology Behind the Service

As is the case with much of IBM’s AI development, significant portions of the technologies underlying Watson AIOps were born out of IBM Research. This new offering, part of what we’re calling AI for IT, is the culmination of years of research and development at IBM Research into how AI can be used to transform the IT lifecycle. Learn more on that here from IBM Research’s Chief Scientist, Ruchir Puri.

IBM Clients Using AIOps

IBM has partnered with Slack to provide what we think is a world-class ChatOps experience. ChatOps bypasses the traditional method of creating and responding to help tickets and support emails. When an issue arises, specific engineers or groups can be alerted by Watson AIOps from inside Slack. Then they can direct the system toward resolutions or deploy code, without needing to leave the chat environment. Everyone’s on the same page and all of these actions are logged in one convenient place. This partnership, along with integration with Box, represents an immediate solution to the current unplanned distribution of engineers who are working from home due to the COVID-19 pandemic and supports a future where IT services are more distributed as a matter of course.

Watson AIOps has also partnered with best-in-class monitoring solutions such as PagerDuty, LogDNA and Sysdig to deliver holistic insights across today’s IT environments. In addition to these, IBM Watson AIOps integrates with other IT Ops tools, is highly customizable, and uses Red Hat Openshift, to run on any cloud.

[1] IDC FutureScape: Worldwide Digital Transformation 2020 Predictions, Doc # US45569118, Oct 2019

Explore the Watson AIOps product page.

Wondering how your enterprise should apply AI to your IT? Take our assessment.

Was this article helpful?
YesNo

More from Artificial intelligence

IBM Tech Now: April 22, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM watsonx at the Masters BYOM on watsonx.ai The 2023 IBM Impact Report Stay plugged in You can check out the IBM Blog Announcements for…

For the planet and people: IBM’s focus on AI ethics in sustainability

4 min read - AI can be a force for good, but it might also lead to environmental and sustainability concerns. IBM is dedicated to the responsible development and deployment of this technology, which can enable our clients to meet their sustainability goals. “AI is an unbelievable opportunity to address some of the world’s most pressing challenges in health care, manufacturing, climate change and more,” said Christina Shim, IBM’s global head of Sustainability Software and an AI Ethics Board member. “But it’s important to…

AI this Earth Day: Top opportunities to advance sustainability initiatives

5 min read - This Earth Day, we are calling for action to conserve our scarcest resource: the planet. To drive real change, it’s crucial for individuals, industries, organizations and governments to work together, using data and technology to uncover new opportunities that will help advance sustainability initiatives across the globe. The world is behind on addressing climate change. With 2024 on track to be the hottest year on record, data and AI can be applied to many areas to help supercharge sustainability efforts. We…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters