May 5, 2020 By Jessica Rockwood 4 min read

When a component in the application end-to-end workflow becomes unavailable causing impact to internal users or external clients, the clock starts ticking, and customer satisfaction can be significantly impacted. Market research firm Aberdeen pegs an outage at about $260,000/hour. And many businesses are ill-equipped to resolve such outages promptly.

But there’s some good news for frustrated CIOs. Market intelligence firm IDC predicts that, by 2024, enterprises that are powered by AI will be able to respond to customers, competitors, regulators, and partners 50% faster than those that are not using AI.[1]

Dispersed Data, Imminent Problems

The central problem many IT departments face is that the vast volumes of data from various sources being processed constantly by the modern enterprise can’t be monitored in real time using traditional data analysis techniques or applications. It can take hours or even days to troubleshoot the root cause of these issues when they occur.

But there is a bright light on the horizon for CIOs. Over the last decade, the IT industry has seen the rise of a new set of frameworks for IT operations. DevOps and DataOps revolutionized the way IT departments integrate with the rest of the enterprise. Now a new industry methodology called AIOps further extends the ability of the IT department to respond to change and address issues in real time.

What Can AI Do?

AI is quickly becoming an essential component of today’s IT departments because it can be used to automate how enterprises detect, identify and respond to potentially costly or catastrophic IT anomalies during an event or even before they occur. AI solutions can address the vast volumes of data, structured and unstructured, that traditional system monitoring tools were not designed to oversee with a singular view.

AI can collect data from a heterogeneous array of sources across the IT infrastructure, from performance alerts to incident tickets. This data can be used, for example, to enable cost reductions and help achieve improved productivity by recognizing a specific time of day when demand on IT resources is low, and shifting compute resources automatically. If automatic adjustments are not desired, data can be displayed in a visual format that provides IT operations managers or Site Reliability Engineers (SREs) with recommended courses of action, and explains the rationale behind those recommendations. AI can automate tasks like shifting traffic from one router to another, freeing up space on a drive, or restarting an application. AI systems can also be trained to self-correct so IT managers and their teams can spend their time on higher value work, while simultaneously getting full visibility into the enterprise’s operations.

Introducing IBM Watson AIOps

We’re excited to announce Watson AIOps, a new product that leverages machine learning, natural language understanding, explainable AI and other technologies to automate IT operations. Drawing from advances made at IBM Research, Watson AIOps gives businesses the ability to address and shape future outcomes, transitioning from a reactive posture toward proactive strategies. They are designed to introduce cost and personnel efficiencies, improve resilience across the enterprise’s information architecture, and speed issue resolution.

Watson AIOps is trained to connect the dots across data sources and common IT industry tools in real time, helping to quickly detect and identify issues. It extends beyond traditional structured sources of operational data, like metrics and alerts, to semi- and unstructured data like logs, tickets, and combines them using machine learning and natural language understanding to create a synthesized holistic problem report to identify and address the situation.

How It Works

Watson AIOps groups diverse sets of log anomalies and alerts based on spatial and temporal reasoning as well as similarity to past situations. Then, it provides a pointer to where the problem is occurring and identifies other services that might be affected, or commonly known as a blast radius. It does this by showing details of the problem based on data from existing tools in the environment, all in the context of the application topology, distilling multiple signals into a succinct report.

Watson AIOps leverages IBM’s leading natural language processing (NLP) technology to understand the content in tickets to identify and extract resolution actions automatically. As a new issue is identified, Watson AIOps will identify similar past incidents and provide the recommended next best actions to address the current issue to restore service. With the insight from Watson AIOps, predictive and proactive capabilities can be leveraged to drive more automation, shifting operations teams to higher value work.

IBM AI innovations are at the forefront of developing trusted and explainable technologies to help SREs interpret the reason behind a Watson AIOps recommendation. Maintaining transparency and explainability is critical to building trust in an AI system’s actions, and IBM continues to develop AI solutions that inspire confidence, including Watson OpenScale.

The Technology Behind the Service

As is the case with much of IBM’s AI development, significant portions of the technologies underlying Watson AIOps were born out of IBM Research. This new offering, part of what we’re calling AI for IT, is the culmination of years of research and development at IBM Research into how AI can be used to transform the IT lifecycle. Learn more on that here from IBM Research’s Chief Scientist, Ruchir Puri.

IBM Clients Using AIOps

IBM has partnered with Slack to provide what we think is a world-class ChatOps experience. ChatOps bypasses the traditional method of creating and responding to help tickets and support emails. When an issue arises, specific engineers or groups can be alerted by Watson AIOps from inside Slack. Then they can direct the system toward resolutions or deploy code, without needing to leave the chat environment. Everyone’s on the same page and all of these actions are logged in one convenient place. This partnership, along with integration with Box, represents an immediate solution to the current unplanned distribution of engineers who are working from home due to the COVID-19 pandemic and supports a future where IT services are more distributed as a matter of course.

Watson AIOps has also partnered with best-in-class monitoring solutions such as PagerDuty, LogDNA and Sysdig to deliver holistic insights across today’s IT environments. In addition to these, IBM Watson AIOps integrates with other IT Ops tools, is highly customizable, and uses Red Hat Openshift, to run on any cloud.

[1] IDC FutureScape: Worldwide Digital Transformation 2020 Predictions, Doc # US45569118, Oct 2019

Explore the Watson AIOps product page.

Wondering how your enterprise should apply AI to your IT? Take our assessment.

Was this article helpful?
YesNo

More from Artificial intelligence

Advance your enterprise Journey to Hybrid Cloud and AI powered by AIOps on Z

2 min read - Thanks to rising costs, skills shortages and ever-growing security threats, businesses must adapt quickly to shifts in demand patterns brought on by a digital workforce and rapidly changing buyer behavior. That requires putting extra emphasis on the resiliency and performance of your business processes and supporting applications. For larger IT organizations with increasingly hybrid and complex application landscapes that often include IBM Z®, it’s essential to take a comprehensive approach to IT operations. The challenge becomes “How do you effectively sift through terabytes of…

How IBM helps Wimbledon use generative AI to drive personalised fan engagement

4 min read - For two weeks in July, the All England Lawn Tennis Club (AELTC) hosts Wimbledon, the most prestigious tournament in the sport. IBM has been partnering with the Club for more than three decades, enhancing coverage of The Championships and engaging fans with rich data-driven insights. This year, some of the most compelling stories of the tournament will be told with the help of IBM® watsonx™, the enterprise-ready generative AI platform. How watsonx keeps Wimbledon fans up to date The new…

IBM, with flagship Granite models, named a strong performer in The Forrester Wave™: AI Foundation Models for Language, Q2 2024

6 min read - As enterprises move from generative artificial intelligence (gen AI) experimentation to production, they are looking for the right choices when it comes to foundation models with an optimal mix of attributes that yield trusted, performant and cost-effective gen AI. Businesses recognize that they cannot scale gen AI with foundation models they cannot trust. We are pleased to announce that IBM, with its flagship Granite family of models, has been named a strong performer in the Forrester Wave™: AI Foundation Models…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters