Monitoring the status and condition of resources

Monitor the operational health of storage systems, fabrics, and switches and the status of their internal resources. You can also view the status of Fibre Channel ports for disk controllers that are associated with a host.Use this information to identify potential problem areas in a storage environment.

Note:
  • By default, storage system health is determined based on the status of its internal resources. You can also choose to include critical and warning severity alerts in the health determination. If you include alerts, then in IBM Storage Insights Pro, both Storage Insights and device alerts are considered. In the free version, only device alerts are considered. Device alerts are received directly from monitored block storage systems through Call Home with cloud services and are shown in the IBM Storage Insights GUI.
  • To include alerts in storage system health determination, go to Settings > Storage system health, select the alert types that you want to include, and then click Save. If you are in the classic UI, go to Configuration > Settings, select the alert types under Storage System Condition section, and save the changes. Only user with the Admin role can choose to include the alert to determine the storage system health.
  • In the modern UI of IBM Storage Insights, top-level resources, such as storage systems, display Health, while in the classic UI the same value is shown as Condition. Internal resources, such as disks, display Status in both UI.
Table 1. Monitoring the Health and status of resources
  Explanation Steps to view status or condition

Status

The status of a resource that is reported when metadata is collected by IBM Storage Insights. Statuses include Normal, Online, Offline, Degraded Ports, Operational, Error, Stopped, Starting, Completed, Unknown, and other statuses. Use the status to determine the condition, and if any actions must be taken to correct the problem. For example, if a disk on a storage system is starting, a warning status Error status icon is reported for that disk by the storage system.
Tip: The status of internal resources is used to determine the operational condition of the associated top-level resources.
    1. Modern UI: Go to storage system details page, select any resource from the left menu panel, a value is shown in the Status column.
    2. Classic UI: In the menu bar, go to the resource type that you want to view. For example, if you want to view the status of switches, go to Resources > Switches. Right-click a resource and select View Details. A status icon is shown next to the image of its related resource and its internal resources.

Health (Condition in classic UI)

The overall operational health (condition in classic UI) of a storage system. This health represents the most critical status that was detected on the resource itself and on its internal resources. You can also choose to include critical and warning alerts while determining the storage system health.

For example, if an error status was detected on a storage system pool, an error icon is shown for the overall condition of the storage system. If no errors, warnings, or unreachable statuses were detected on a resource or on its internal resources, then a green symbol is shown for the condition of the storage system.

  1. Modern UI:
    Dashboard view
    The Health and data collectionwidget on the Overview dashboard displays the count of the resources with the relevant health icon. Also, the left pane listing the resources, displays the health icon behind each resource.
    Resource list pages
    Go to Main menu, expand Inventory, and click on any resource like Storage systems, Pools, or Fabrics. The overall health of a resource is displayed in the Health column, and aggregated in the health icons at the Health section above the resource table.
  2. Classic UI:
    Dashboard view
    In the menu bar, go to Dashboard > Operations. The icons that show error and warning conditions for the block storage systems are displayed in the tiles.
    Resource list pages
    In the menu bar, go to the resource type that you want to view. For example, if you want to view the condition of block storage systems, go to Resources > Block storage systems.
    The overall condition of a resource is displayed in the Condition column, and aggregated in the condition icons at the top left corner of its details page.
IBM Storage Insights provides a number of different icons to help you quickly determine the health of resources.
Table 2. Possible statuses and Health of resources
Icon Status/Health Explanation
Error status

Error

A serious problem was detected on a resource or on its internal resources. Resolve these problems as soon as possible.

Error status

Error - Acknowledged

An Error status was detected and acknowledged. An Error - Acknowledged status means that the issue is reviewed and is either resolved or considered safe to ignore. For internal resources, an Error - Acknowledged status is treated as normal and contributes to determine the condition of related or higher-level resources.

For example, if a disk or drive has an error status, the related storage system also shows an error condition. If the error status of the disk or drive is acknowledged, the status changes to Error-Acknowledged. This acknowledged status is then treated as normal, and is used in evaluating the overall condition of the storage system. In this case, if the other internal resources of the storage system are also normal, the storage system condition is considered normal.

Unreachable status

Unreachable

A resource is not responding to requests from the IBM Storage Insights host. This status might be caused by a problem with the data collector.

Unreachable acknowledged status

Unreachable - Acknowledged

An Unreachable status was detected and acknowledged. An Unreachable - Acknowledged status indicates that a status was reviewed and is either resolved or can be ignored.

An Unreachable - Acknowledged status means that the issue is reviewed and is either resolved or considered safe to ignore. For internal resources, an Unreachable - Acknowledged status is treated as normal and contributes to determine the condition of related or higher-level resources. For example, if the status of the disk or drive is unreachable, the condition of the associated block storage system is also unreachable. If the Unreachable status of the disk is acknowledged, the status changes to Unreachable-Acknowledged. This acknowledged status is then treated as normal, and is used in evaluating the overall condition of the storage system. In this case, if the other internal resources of the storage system are also normal, the storage system condition is considered normal.

Warning status

Warning

A Warning status represents potential problems on a resource or on its internal resources. This status is not critical.

Warning status

Warning - Acknowledged

A Warning status was detected and acknowledged. A Warning - Acknowledged status indicates that a status was reviewed and is either resolved or can be ignored.

An Warning - Acknowledged status means that the issue is reviewed and is either resolved or considered safe to ignore. For internal resources, an Warning - Acknowledged status is treated as normal and contributes to determine the condition of related or higher-level resources. For example, if the status of the disk or drive is warning, the condition of the associated block storage system is also warning. If the Warning status of the disk is acknowledged, the status changes to Warning -Acknowledged. This acknowledged status is then treated as normal, and is used in evaluating the overall condition of the storage system. In this case, if the other internal resources of the storage system are also normal, the storage system condition is considered normal.

Normal status

Normal

No warnings or errors were detected on a monitored resource.

Not monitored status

Not Monitored

For hosts, this status is displayed when IBM Storage Insights monitors the storage system that the host is connected to, but the host itself was not added for monitoring. Unmonitored hosts are automatically created based on the host connections of monitored storage systems. Each host connection is represented as an unmonitored host.

For switches, when you add a chassis, its hosted switches are automatically discovered and added for monitoring. Any other switches that are connected to the switches on the monitored chassis are also discovered.
  • If chassis that host the other, connected switches use the same connection credentials as the chassis that you added, the chassis and their switches are also added for monitoring.
  • If chassis that host the other, connected switches don't use the same credentials, the chassis and their switches are added to IBM Storage Insights but are not monitored.
Unknown status

Unknown

A resource is known to IBM Storage Insights but is not monitored. To change an Unknown status, ensure data is being collected for the resource.