Triggering conditions for server alerts

You can set up IBM Spectrum Control so that it examines the attributes and capacity of a server and notifies you when changes are detected.

Alerts can notify you of general changes and capacity changes on the following resources:
Important: A number of other attributes are available for alerts and are based on the key properties of a server. To view a complete list of server attributes upon which you can alert, go to Settings > Alert Policies. Double-click a default policy for a server. Click Edit Alert Definitions on the Alert Definitions tab. View the attributes that are available in the general and capacity categories. Note that the attributes that are automatically configured for alerts in the default alert policies, or default alerts, have a status of Active.

In the tables, default alerts are marked with an asterisk (*).

Tip: For capacity attributes, you can generate alerts when the amount of storage is greater than, less than, or equal to a specified value. You can also determine the unit of measurement for the attribute, such as KiB, MiB, GiB, or TiB.

Servers

Table 1. Triggering attributes and conditions for general changes on servers
General Attributes Descriptions of Triggering Conditions

Agent State*

A Storage Resource agent is in one of the following states:
Not Normal
An error or warning state was detected on a Storage Resource agent.
Warning
A warning state was detected on a Storage Resource agent. For example, a warning state might occur when an agent must be upgraded to the same version level as the IBM Spectrum Control server to which it is communicating.
Error (default)
An error state was detected on a Storage Resource agent. For example, an error state might occur when an agent was not able to be upgraded.

Last Successful Probe

A specified amount of time has passed since a probe was able to collect data about a server. You can use this alert to be notified when up-to-date configuration and status data is not being collected about a server and its existing data might be stale. Data collection might be interrupted or not occur if the resource, network, or IBM Spectrum Control server are unavailable.

Probe Status*

One of the following statuses is detected for a probe:
Not Successful
An error or warning occurred during data collection. This status indicates that a probe did not collect any data, or only collected a partial set of data about a resource.
Warning
A probe completed, but might not have collected a complete set of data. This status might occur if data cannot be collected about one or more of the internal resources of a resource.
Error (default)
A probe did not complete when it attempted to collect asset data about the resource. This status might occur if the resource cannot be reached during data collection.
For details about why a specific status occurred, check the log for the probe. To check the log, go to the details page for a resource, click Data Collection, and select Actions > Open Logs in the Probe section on the Data Collection page.

Status*

One of the following statuses is detected on a server:
Not Normal
An error or warning status was detected on the server or its internal resources.
Warning
A warning status was detected on the server or its internal resources. For example, a warning status might occur when an HBA or HBA to a server node is newly discovered, is missing, or is rediscovered.
Error (default)
An error status was detected on the server or its internal resources. For example, an error status might occur when a server goes offline, or a server disk is disconnected or partially disconnected if it has multiple paths and one of the paths is disconnected.
Unreachable
One or more of the monitored resources for a server are not responding. This status might be caused by a problem in the network or by a Storage Resource agent that is no longer running and did not communicate that it was shutting down.
Table 2. Triggering attributes and conditions for capacity changes on servers
Capacity Attributes Descriptions of Triggering Conditions

Available Drive Capacity (Previously known as Available Disk Space)

The unused disk capacity on the local and SAN-attached storage for the server. SAN-attached storage is assigned to the server from storage systems.

Available File System Capacity

The amount of unused capacity in the file systems on the server.

Available file system capacity does not include capacity that is reserved for the operating system. For example, the available capacity for tmpfs on UNIX operating systems is not included in this value.

Drive Capacity (Previously known as Total Disk Space)

The total disk capacity for all the local and SAN-attached storage on the server. SAN-attached storage is assigned to the server from storage systems.

File System Capacity

The amount of file system capacity on the server.

File System Capacity from Storage Systems

The amount of file system capacity that is assigned to the server from storage systems.

The file system capacity from storage systems is only available when SAN-attached storage is assigned to the server.

Mapped SAN Capacity (Previously known as Assigned SAN Space)

The amount of disk capacity that is assigned to the server from storage systems.

The disk capacity from storage systems is only available when SAN-attached storage is assigned to the server.

Used Capacity

The amount of used disk capacity on the local and SAN-attached storage for the server. SAN-attached storage is assigned to the server from storage systems.

Controllers

Table 3. Triggering attributes and conditions for changes on disk controllers
Controller Attributes Descriptions of Triggering Conditions
  • Driver Version
  • Firmware
  • ROM Version

The version of the driver, firmware, or read-only memory (ROM) on a disk controller changes. You can use a number of operators to determine when you are notified of a version change, such as when the driver, firmware, or ROM is, or is not, a specific version, or when the version number contains a specific value.

Use this alert for HBAs only.

Last Data Collection

A specified amount of time since data was collected about a controller. Use this alert to be notified if data is not being collected about a controller or if the existing data is becoming too stale.

New Disk Controller

A disk controller is detected for the first time. Use this alert to be notified of hardware additions on servers.

Removed Disk Controller

A previously monitored disk controller can no longer be found. Historical data about the controller is retained, but no current data is being collected. Use this alert to be notified if a controller is removed or becomes unavailable.

Status*

One of the following statuses is detected on a disk controller:
Not Normal
An error or warning status was detected on the controller.
Warning
A warning status was detected on the controller.
Error (default)
An error status was detected on the controller.

Disks

Table 4. Triggering attributes and conditions for general changes on disks
General Attributes Descriptions of Triggering Conditions

Firmware

The version of the Licensed Internal Code on the disk changes. You can use a number of operators to determine when you are notified of a firmware change, such as when the firmware is, or is not, a specific version, or when the version number contains a specific value.

Multipathing Policy

The multipathing policy that is in effect for a disk. For example, you can be notified when the policy changes, or when the policy is Round Robin, Load Balancing, Failover Only, or other another policy.

New Disk

A disk is detected for the first time. Use this alert to be notified of hardware changes on servers or hypervisors.

Paths

The number of access paths that are associated with the disk falls outside a specified range, or is equal to or not equal to a specified value.

Removed Disk

A previously monitored disk can no longer be found. Historical data about the disk is retained, but no current data is being collected. Use this alert to be notified if a disk is removed or becomes unavailable.

Status*

One of the following statuses is detected on a disk:
Not Normal
An error or warning status was detected on the disk.
Warning
A warning status was detected on the disk.
Error (default)
An error status was detected on the disk.
Table 5. Triggering attributes and conditions for capacity changes on disks
Capacity Attributes Descriptions of Triggering Conditions

Available Drive Capacity (Previously known as Available Disk Space)

The unused capacity on a disk that is attached to the server.

Capacity

The total amount of storage capacity on a disk that is attached to the server.

Used Capacity

The amount of used storage capacity on a disk that is attached to the server.

Disk Groups

Table 6. Triggering attributes and conditions for general changes on disk groups (volume groups)
General Attributes Descriptions of Triggering Conditions

Deleted Volume Group

A previously monitored volume group can no longer be found. Historical data about the volume group is retained, but no current data is being collected. Use this alert to be notified if a volume group is deleted or becomes unavailable.

New Volume Group

A volume group is detected for the first time.

Status*

One of the following statuses is detected on a disk group:
Not Normal
An error or warning status was detected on the disk group.
Warning
A warning status was detected on the disk group.
Error (default)
An error status was detected on the disk group.
Table 7. Triggering attributes and conditions for capacity changes on disk groups (volume groups)
Capacity Attributes Descriptions of Triggering Conditions

Available Capacity

The unused storage capacity on a server disk group.

Used Capacity

The amount of used storage capacity on a server disk group.

Volume Group Capacity

The total amount of storage capacity on a server volume group. This value is inclusive of all storage capacity and applies to all capacity values related to volume groups.

File Systems and Logical Volumes

Table 8. Triggering attributes and conditions for general changes on file systems and logical volumes
General Attributes Descriptions of Triggering Conditions

Available Inodes

The number of unused inodes on file systems on the operating system changes, falls outside a specified range, or is equal to or not equal to a specified value.

Deleted File System

A previously monitored file system is deleted or unmounted from a server. Historical data about the file system is retained, but no current data is being collected. This attribute applies to file systems on the following resources:
  • Storage systems that are configured for file storage, including Storwize V7000 Unified
  • Servers that are managed by Storage Resource agents

Deleted Logical Volume

A previously monitored logical volume can no longer be found. Historical data about the logical volume is retained, but no current data is being collected. Use this alert to be notified if a logical volume is removed or becomes unavailable.

New File System

A file system was detected for the first time. This alert applies to file systems on the following resources:
  • Storage systems that are configured for file storage, including Storwize V7000 Unified
  • Servers that are managed by Storage Resource agents

New Logical Volume

A logical volume is detected for the first time.

Used Inodes

The number of used inodes on file systems on the operating system changes. You can use a number of operators to determine when you are notified, such when the number of used inodes falls outside a specified range, or is equal to or not equal to a specified value.

Table 9. Triggering attributes and conditions for capacity changes on file systems and logical volumes
Capacity Attributes Descriptions of Triggering Conditions

Available File System Capacity

The amount of unused storage capacity on a file system on the server disk.

File System Capacity

The total amount of storage capacity on a file system on the server disk.

Logical Volume Capacity

The total amount of storage capacity on a logical volume on the server disk.

Used File System Capacity

The capacity on a file system on the server disk.

Used Capacity

The percentage of used storage capacity on a file system or logical volume on the server disk.

Paths

Table 10. Triggering attributes and conditions for general changes on paths
Path Attributes Descriptions of Triggering Conditions

Deleted Path

A previously monitored access path for a server disk can no longer be found. This change might or might not affect the availability of the disk because there might be more than one path available.

New Path

An access path for a disk is detected for the first time.

Status*

One of the following statuses is detected on a path:
Not Normal
An error or warning status was detected on the path.
Warning
A warning status was detected on the path.
Error (default)
An error status was detected on the path.

Shares

Table 11. Triggering attributes and conditions for general changes on shares
Share Attributes Descriptions of Triggering Conditions

Deleted Share

A previously monitored share can no longer be found. Historical data about the share is retained, but no current data is being collected. Use this alert to be notified if a share is removed or becomes unavailable.

New Share

A new share was detected for the first time.

Triggering conditions for the IBM Spectrum Control server

The server on which IBM Spectrum Control is installed is automatically monitored for conditions that might cause an interruption in product functions. When these conditions are detected, alerts are triggered and shown on the Home > Alerts page. You do not need to manually define alerts for these product-related conditions; they are automatically enabled.

Table 12. Triggering conditions for the IBM Spectrum Control server
Triggering Condition Explanation Related Error Message
Database unavailable The product database is not available. This database is the repository for information that is collected about the monitored resources in your environment. ALR4112E, ALR4113E
High memory usage* A high amount of memory is being used by a server process and might cause stability problems. ALR4103W
Database alarm* The system database or the database manager that hosts the product's database repository is reporting an alarm. ALR4104W
High workload The workload queue for the Device server is high and might cause performance issues. ALR4105W
High number of external events The server is receiving a high number of external events, such as CIM indications or SNMP traps. The high number of events might cause performance issues. ALR4106W