AIOps Insights dashboard

Use the AIOps Insights dashboard to view key insights provided by IBM Cloud Pak® for AIOps. View insights for a 24-hour, 7-day, or 30-day period. This information can show the value of using IBM Cloud Pak for AIOps for your IT operations.

The AIOps Insights dashboard can be accessed from the administration page of IBM Cloud Pak for AIOps, under Overview > Quick navigation > AIOps insights. Alternatively, click the navigation icon (four horizontal bars) in the upper-left corner of the screen to go to the main navigation menu. Then click Operate > AIOps insights.

Overview of dashboard
Figure. AIOps Insights dashboard

Prerequisites

All relevant data integrations and activity (incident opened, active runbook, and so on) must be operational to allow live data to flow into the chart.

By default, all users have access to AIOps Insights. However, Administrators can restrict access, by changing a user's permissions. For more information, see Roles and permissions.

Note: If you have not yet setup integrations or implemented activity (or there is limited information) you can view examples of each AIOps Insights. chart, containing sample data, by taking the AIOps Insights. product tour. Click the signpost icon beside your avatar, then Product Tours > Discover AIOps insights.

Summary bar

The summary bar of the dashboard displays key indicators and shows how IBM Cloud Pak for AIOps is improving the efficiency of your IT operations teams. It summarizes the charts currently showing on the dashboard. An arrow indicates an increase or decrease in activity.

Summary bar
Figure. Summary bar

For more detailed information on metrics in the summary bar, view the individual charts.

Mean time to restore

The Mean time to restore chart displays the average time that is needed to isolate, repair, and fully resolve IT incidents.

Mean time to restore chart
Figure. Mean time to restore chart

Chart definitions:

Acknowledge: the mean time that it takes to begin work on an incident. Toggle the purple Acknowledge box in the legend to display only this data on the chart.

Repair: the mean time taken to reestablish a service. Toggle the green Repair box in the legend to display only this data on the chart.

Restore: the total mean time (Acknowledge time and Repair time combined) taken to restore service.

Note: All mean values that are generated are based on dividing the item total by the number of data points on the chart. Because of the time intervals, this is always the selected time period +1 unit. So 24-hour totals are divided by 25, 7-day totals are divided by 8, 30-day totals are divided by 31.

Incident activity

Within IBM Cloud Pak for AIOps, an incident is a collection of alerts, insights, and potential solutions to help drive incident remediation.

By clicking the chart icon chart icon in the Incident activity chart, you can select to view by Time series (default view) or Summary.

  • The Time series chart displays the number of incidents that were opened and closed during the selected time period.

Incidents closure rate
Figure. Incidents closure rate

  • The Summary chart displays open and closed incidents according to their severity level, over the selected time period.

Summary setting
Figure. Summary setting

Chart definitions:

Previously opened: incidents carried over from the preceding time unit (for example, in a 7-day period, it would be incidents that are carried over from the day before), and previously opened incidents that are not yet resolved. Hover the cursor over the purple bar to see specific numbers for each day. Toggle the purple Previously opened box in the legend to display only this data on the chart.

Newly opened: incidents opened in the selected time period. Hover the cursor over the green bar to see specific numbers for each day. Toggle the green Opened box in the legend to display only this data on the chart.

Closed: the number of incidents that closed in the selected time period. Expressed as a line graph. Toggle the blue Closed box in the legend to display only this data on the chart.

Available incidents: the total number of incidents that were in an open state over the selected period. Combines the number of previously opened incidents – carried over when the selected time period began - with the aggregate number of new incidents (combined value of all the green bars) opened.

Incidents closed: the aggregated number of incidents closed in the selected period.

Closure rate: Incidents closed expressed as a percentage of the Total incidents.

Noise reduction

The Noise reduction chart displays how IBM Cloud Pak for AIOps reduces the number of IT events and alerts that your operations staff must evaluate, speeding up recovery time and reducing employee fatigue.

Noise reduction dashboard
Figure. Noise reduction dashboard

Note: Updated events first created more than 30 days earlier are not incremented into the Noise reduction chart. For more information, see this Known issue.

Chart definitions:

Noise reduction occurs in three phases, as depicted in the chart:

  1. Events: Duplicate or irrelevant events are removed.
  2. Alerts: Events are correlated and grouped into alerts based on when they occurred and the relationships between the associated IT resources.
  3. Incidents: Alerts are grouped into prioritized incidents that also include insights and potential solutions to help drive incident remediation.

The Noise reduction percentage is the reduction that occurs in the transition from the original number of events that are generated to the final number of incidents.

IBM Cloud Pak for AIOps includes three noise reduction algorithms, two of which require no pre-training. You can also create policies to suppress events and precisely control how alerts are grouped into incidents. For more information, see About event grouping, Deduplication, and About policies.

Incidents and noise reduction:

Leveraging Incidents is fundamental to deriving value from IBM Cloud Pak for AIOps. If you limit the number of policies that create incidents, by default you also limit the level of noise reduction that can be tracked and presented in the Noise Reduction chart.

Policies connect events and alerts to incidents. Without that connection, the value of the Noise Reduction chart is unnecessarily restricted. It can still display the number of Events and Alerts but not the level of Noise Reduction, which will instead display as empty (double dashes).

Noise reduction dashboard with incidents restricted
Figure. Noise reduction dashboard with incidents restricted

Note: Alerts that surface in Resource Management may not flow into the Noise Reduction chart. This is because they may not be part of a policy that directly ties them to an incident.

Runbook usage

Runbook usage
Figure. Runbook usage

Use the runbook usage chart to view the number of automated versus manual runbook runs and the success rates of them.

Chart definitions:

Automated runs: Runbooks that were run without any operator interaction, in the selected period. Toggle the purple Automated runs box in the legend to display only this data on the chart.

Manual runs: Runbooks that were implemented manually by an operator who followed an exact procedure, in the selected period. Toggle the green Manual runs box in the legend to display only this data on the chart.

Total runs: The combined number of automated and manual runs, in the selected period.

Success rate: The percentage of automated and manual runbooks successfully implemented.

Runbook automation: Percentage of total runbooks that were automated - that needed no operator interaction.

Chart tooltips

White dots highlight particular points on a chart. Clicking on a chart and then clicking one of these dots generates a tooltip data box for that particular point in time. Note: This is not available for Noise reduction chart as it does not use a line graph.

Tooltip
Figure. Tooltip

Tooltip information is also available for metrics in the sidebar of a chart.

Tooltip sidebar
Figure. Tooltip sidebar