frequently asked questions (FAQ)

Review the following frequently asked questions for IBM Cloud Pak for AIOps

About IBM Cloud Pak for AIOps

What is IBM Cloud Pak for AIOps?

IBM Cloud Pak for AIOps is an AIOps platform developed by IBM to help organizations simplify complex IT operations and improve efficiency. The platform utilizes advanced, explainable AI to assess, diagnose, and resolve incidents across mission-critical workloads, easing the path to adopting AI for ITOps and decreasing operational costs. By automating routine tasks and providing advanced diagnostic capabilities, Cloud Pak for AIOps enables organizations to save time and resources, focusing on innovation and strategic initiatives. Its explainable AI approach ensures transparency and trust in the decision-making process, making it an ideal solution for high-stakes situations requiring rapid and accurate decision-making. To get a full overview of AIOps go to Overview.

What are the Capabilities of IBM Cloud Pak for AIOps?

Cloud Pak for AIOps capabilities include:

  • Cross-domain data ingestion and integration
  • Topology generation
  • Event correlation and analytics
  • Incident and pattern recognition
  • Augmented remediation

How can Cloud Pak for AIOps assist an Ops engineer?

An Ops engineer can leverage the powerful AI and analytics capabilities of the Cloud Pak for rapid incident remediation, augmented with tailored context, and recommended actions. Use cases include incident remediation, and service assurance.

How can Cloud Pak for AIOps assist an Ops Manager?

Operations Managers can more effectively understand the types of incidents that most frequently occur and create policies and runbooks to improve their operations responsiveness and maximize uptime. Use cases include policy administration, and incident triage management.

How can Cloud Pak for AIOps assist a Line of Business (LOB) engineer/CTO?

Line of Business Managers and CTO’s can take advantage of the AIOps Insights Dashboard to understand how their applications are performing over time and the benefits their AIOps Platform is providing to the business. A use case is application or service reporting.

How can Cloud Pak for AIOps assist a Site Reliability Engineer (SRE)?

A SRE can ensure that their applications are running smoothly, even when making changes or upgrades with Change Risk Assessment and the Topology timeline view. Use cases include incident resolution and change risk mitigation.

What are the benefits of using IBM Cloud Pak for AIOps?

The main benefit is the automation of every aspect of incident resolution, including:

  • Diagnose problems faster and correlate a vast amount of unstructured and structured data in real time
  • Automate with confidence and empower teams to automate tasks with transparent AI decision-making
  • Gain insights where you work and keep teams focused, incorporating insights and recommendations into existing workflows to make informed decisions
  • Build and manages securely – build policy at the microservice level and automate across application components
  • Integrate seamlessly with monitoring tools and pretrained AI models to gain new insights

About the documentation

Where can I learn about new or changed features for each release?

To learn about new features for the release, review the What's new.

Where can I access known issues for each release?

To review known issues and limitations, go to Known issues.

Where can I find terms and definitions for IBM Cloud Pak for AIOps?

Terms and definitions for IBM Cloud Pak for AIOps can be found in the Glossary.

Where can I find out about deprecated and removed functions?

To find out more about deprecated and removed functions, go to Deprecated and removed functions.

About installing the Cloud Pak

Where can I find Cloud Pak for AIOps installation documentation?

AIOps installation documentation can be found here.

Where can I learn about the Hardware requirements for Cloud Pak for AIOps?

Hardware requirements for Cloud Pak for AIOps can be found at Hardware requirements.

About administering and setting up the Cloud Pak

Where can I learn about administering the Cloud Pak?

To find out more about administration tasks such as adding users, backing up and restoring, and more, go to Cloud Pak administration.

About integrating with the Cloud Pak

Integrations in Cloud Pak for AIOps

There are over 160 industry standard integrations in Cloud Pak for AIOps, which ingest Events and Alerts, Metrics, Topology, and Logs from across your estate and tooling. You can create your own custom integrations using generic integrations and SDKs. You can also leverage your existing Netcool Probes. Cloud Pak for AIOps provides integration configuration and management.

Which types of data are collected by Cloud Pak for AIOps?

There are two types of data collected: structured and unstructured. Structured data includes logs, tickets, and CI/CD, while unstructured data includes events (alerts), metrics, and topology changes.

IBM Tivoli Network Manager deployment and integration

Where IBM Tivoli Network Manager is present, it should be deployed in a resilient manner and connected to a Netcool/OMNIbus ObjectServer. Any events it generates flows into AIOps through the Netcool Connector. An instance of DASH and Netcool/OMNIbus WebGUI with Topoviz installed is needed to access Structure Browser views. Its topology can be brought into the Cloud Pak for AIOps through the IBM Tivoli Network Manager Observer.

What are best practice guidelines when including Netcool components with the Cloud Pak?

The necessity to include Netcool components in an AIOps deployment depends on your requirements when planning. Where Netcool components are needed, their deployment should be governed by best practices. Some guidelines include:

  • All Netcool components should be deployed with the latest fix pack applied
  • All components should be deployed in a resilient manner
  • All components should be configured to auto-start on machine boot

How do I initiate custom integrations, automations and workflows with IBM Netcool/Impact

Any custom integrations, automations, or workflows that are done by using Netcool/Impact can either be automatically initiated from the Netcool/OMNIbus Aggregation ObjectServer pair through an Event Reader, or from AIOps directly with the Netcool/Impact Connector.

How does Infrastructure Automation (IA) work with AIOps?

You can manage your environment with infrastructure automation and take incident-resolving actions directly from Cloud Pak for AIOps. Infrastructure Automation that are used along with Cloud Pak for AIOps allows for:

  • Discovery of existing VMs
  • Inventory and lifecycle management of infrastructure
  • Compliance and policy Management of VMs
  • Chargeback and reporting
  • Day 2 provisioning of VMs
  • Running of playbooks to automate VM and middleware configuration
  • Configuration actions to remediate policy violations with Ansible

About the features of Cloud Pak for AIOps

How does Cloud Pak for AIOps calculate change risk?

Cloud Pak for AIOps discovers the relationship between changes and incidents for risk prediction. It uses state-of-the-art deep learning models for change request prediction risk, and provides explainability by presenting prior, related changes that were problematic, and uses state-of-the-art deep learning models for change request.

How are incidents handled in Cloud Pak for AIOps?

Cloud Pak for AIOps provides a complete overview/integration of your existing operational data: Events, Metrics, Logs, Topology across your silos. No change is needed in instrumentation.

It provides event processing, compression, and correlation to identify service-impacting events, and early detection of anomalies across events, metrics, and logs to enable proactive issue awareness and increased context for an incident.

It also gives probable cause localization and blast-radius to know what caused the problem and what is affected.

How does Cloud Pak for AIOps help you resolves issues?

It applies recommended actions to remediate the issue right away, and allows you choose the level of automation right for the task, from manual step-by-step guidance to semi-automated with guidance and automated commands to full auto-triggered automation. It also allows you collaborate and align across teams by using automatically generated enriched Ops information, analysis and guidance.

How does Cloud Pak for AIOps help you understand your managed environment?

Cloud Pak for AIOps help you visualize and understand your heterogeneous environment, including key metrics from a single screen. It offers a comprehensive application representation, even when “truth” spans multiple sources of insight. It gives a full view of changes to your environment using an historic timeline.

What are the different types of event correlation in Cloud Pak for AIOps?

Cloud Pak for AIOps uses three different event correlation methods that work together to correlate events.

  • Temporal

    Event analytics groups events that have always suspiciously occurred together in the past.

  • Scope-based

    Operations can define their own groupings based on local knowledge.

  • Topology-based

    Defines templates for groups of related resources. Cloud Pak for AIOps automatically finds all examples that match the recipe.

What is log anomaly detection?

It is the detection of anomalies from log messages and includes:

  • Anomalous time period prediction
  • Entity mentions in error logs
  • Explanation and pointer to log messages from anomalous time periods

What is metric anomaly detection?

It is the detection of anomalies from time series metrics including:

  • Deviation from normal operating range
  • Change from variable to flat
  • High and low range changes
  • Exceed previous range
  • Exceed normal range variance

What is probable cause?

You can derive probable fault components using vertex-weighted topology graph traversal and a reasoning engine to understand the meaning of the topology relationships. It allows for localization to the most likely source of an issue within the application topology.

What is incident similarity?

For a given problem description, it helps find top 'k' ranked similar incidents from the past. It helps you understand the current issue and previous successful resolved actions. it consumes tickets and any data from the ticket progression to closure, including human-written investigation and resolution actions. It uses entity-action extraction and action sequence mining to understand tickets and summarize what was done.

What is fault localization and blast radius?

You can derive the full scope of components by using vertex-weighted topology graph traversal and a reasoning engine to understand the meaning of the topology relationships. It allows for blast-radius through directional dependency analysis of the related components that interact with the localized source of the issue.

What is event seasonality?

With event seasonality you can automatically discover events that occur with a regular pattern, which helps you identify chronic issues that can go undetected, provides valuable insights into problem-solving, and also continuous learning over days, weeks, months, and years.

What is change risk prediction?

It allows you assess the risk for each proposed change based on issues that are caused by historical changes. You can harvest and analyze the change ticket history to identify changes that implicitly failed when applied. You can also identify changes that resulted in subsequent issues if they rolled out.

What is event grouping with entity linking?

It groups events, alerts, anomalies to reduce tickets. These can be:

  • Topological - which are group events that are related or connected (or both)
  • Temporal - which is how you automatically discover events that tend to co-occur
  • Scope - which automatically groups events based on scope
  • Supergroup - which is a group of groups

What are the characteristics of Cloud Pak for AIOps algorithms?

  • Preprocessing

    Each algorithm has its own preprocessing chain, which performs a chained set of transforms on the data, and Cloud Pak for AIOps validates the data fits the algorithm and will result in successful training.

  • Unsupervised

    Algorithms built by IBM using tested data to validate the algorithms. The algorithm builds models without labels. Cloud Pak for AIOps algorithms create models of “normal” based on your data.

  • Validation

    Understand if the model is not overfitting or underfitting the data. The algorithms self-adjust the model over time.

What are the advantages of AI-powered over traditional incident management?

IT teams rely on incident management to find and fix any number of unplanned events, from the mundane to the profound. The question to ask is as your systems have evolved, has your approach to incident management kept pace? An AI-powered solution can help you stay competitive and keep your business operations running smoothly. The advantages include:

  • Reaction time

    AI-powered incident management offers teams proactivity and efficiency. It automatically analyzes information from the environment to discover, predict, and prevent incidents before they can impact business processes, which can result in faster time to identify incident causes, and helps maximize uptime and enhance customer satisfaction.

  • Data management

    IT ingests and correlates data from various sources to provide a comprehensive view of the entire IT landscape. It helps IT operation teams quickly identify the underlying causes of incidents. It promotes collaboration between teams, subject matter experts, and stakeholders. It expedites incident resolution, reduces Mean Time To Repair (MTTR) and improves overall operational efficiency.

  • Scalability

    AI and automation enable scalability. AI can analyze vast amounts of data in real time to handle increases in the volume and variety of incidents. Teams can more effectively manage incidents, regardless of the size and sophistication of their IT environment.

  • Predictability

    AI anticipates issues and prevents them proactively. It can detect thresholds automatically and identify patterns, trends, and potential risks that can lead to incidents. It enhances operational efficiency by empowering preventive actions, such as proactive maintenance and capacity adjustments, that reduce the likelihood of incidents occurring.

What is Similar ticket analysis?

Similar tickets is an unsupervised learning algorithm that aggregates information about similar messages, anomalies, and events for a component or a service. It can also extract the steps used to fix previous incidents, if documented. It does this by connecting to your ServiceNow instance and analyzes the existing tickets therein. After training has completed, AIOps will automatically alert users working on an incident to any tickets raised in the past that are similar.