AIOps reimagines hybrid multicloud platform operations

By | 4 minute read | July 27, 2022

Today, most enterprises use services from more than one Cloud Service Provider (CSP). Getting operational visibility across all vendors is a common pain point for clients. Further, modern architecture such as a microservices architecture introduces additional operational complexity.

Figure 1 Hybrid Multicloud and Complexity Evolution

Traditionally this calls for more manpower. But this traditional approach introduces more challenges. As shown in the following diagram, an issue in the environment triggers several events across the full stack of the business solution. This results in an unmanageable event flood. Moreover, there are often duplicate events due to full-stack level observability and these events result in data silos.

Figure 2 IT Service Management Complexity

IT is a critical part of every enterprise today, and even a small service outage directly affects the top line. Consequently, it is not uncommon for clients to ask for a 30-minute resolution commitment when something goes wrong. This is usually not enough time for a human to resolve an issue.

What is the solution?

This is where AIOps comes to the rescue, preventing these issues before they occur. AIOps is the application of artificial intelligence (AI) to enhance IT operations. Specifically, AIOps uses big data, analytics, and machine learning capabilities to do the following:

  • Collect and aggregate the huge and ever-increasing volumes of operations data generated by multiple IT infrastructure components, applications and performance-monitoring tools
  • Identify significant events and patterns related to system performance and availability issues
  • Diagnose root causes and report them to IT for rapid response and remediation, or automatically resolve these issues without human intervention

By replacing multiple manual IT operations tools with an intelligent, automated platform, AIOps enables IT operations teams to respond more quickly and proactively to slowdowns and outages, with less effort. It bridges the gap between an increasingly difficult-to-monitor IT landscape and user expectations for little to no interruption in application performance and availability. Most experts consider AIOps the future of IT operations management.

How could we reimagine cloud service management and operations with AI?

Refer to the lower part of the diagram below (box 3: Environment), which represents the environments where the workloads run. Continuous releases and deployments of these applications are typically achieved through the continuous delivery process and tooling that is shown on the left side of the diagram (box 2: Continuous Delivery).

Figure 3 AI Infused DevSecOps and IT Control Tower

The applications continuously send telemetry information into the operational management tooling (box 4: Continuous Operations). Both the continuous delivery tooling and the continuous operations tooling ingest all the data into the AIOps engine shown at the top (box 7: AIOps Engine). The AIOps engine is focused on addressing four key things:

  1. Descriptive analytics to show what happened in an environment
  2. Diagnostics to show why it happened
  3. Predictive analytics to show what will happen next
  4. Prescriptive analytics to show how to achieve or prevent the prediction

In addition to this, enterprise-specific data sources such as a shift roster, SME skill matrix or knowledge repository enrich the AIOps engine (box 1: Enterprise specific data).

Additionally, the AIOps engine consumes public domain data such as open-source communities, product documentations and sentiments from social networks (box 6: Public domain content). ChatOps and Runbook Automation ingest the insights and the automation that the AI system produces and leverage it to establish the new day in the life of an incident (box 5: Continuous Operations). ChatOps brings humans and chatbots for conversation-driven collaboration or conversation-driven DevOps. Additionally, the AIOps engine also dynamically reconfigures the DevSecOps tools, providing continuous delivery and continuous operations through AI-derived policy ingestion.

Several products in the marketplace have already evolved to provide AIOps capabilities such as an anomaly detection feature. This framework consumes the outcomes provided by these AIOps engines (denoted as edge analytics in Figure 3) and combines multiple sources to provide an enterprise-level view.

IT processes such as incident/problem-resolution processes are ad hoc in nature. They differ greatly from structured business processes such as loan approval processes or claim settlement processes. IT processes have stringent SLAs due to the high cost of outage to the business, and the persona involved collaborate intensely and interact with disparate tools to accomplish their goals. Applying business process automation technologies to IT processes will not yield high productivity benefits. ChatOps have transformed the way ITOps teams collaborate to resolve IT incidents. AIOps and ChatOps are the appropriate tools to drive productivity in IT processes. ChatOps enhances the collaboration experience of SRE with other personas participating in IT processes. AIOps delivers insights for SRE to accelerate incident resolution process.

In a nutshell, as clients undertake large digital transformation programs based on a hybrid cloud (or multicloud) architecture, IT Operations needs to be reimagined. With ever increasing complexity, AIOps is indispensable. To know more about AI for IT Operations and IBM PoV, refer to IBM Consulting.