Triggering conditions for storage system alerts

You can set up IBM Spectrum® Control so that it examines the attributes, capacity, and performance of a storage system and notifies you when changes or violations are detected.

Important: Not all the attributes upon which you can alert are listed here. To view a complete list of attributes upon which you can alert, go to Settings > Alert Policies. Double-click a default policy for a storage system. Click Edit Alert Definitions on the Alert Definitions tab. View the attributes that are available in the general, capacity, and performance categories. Note that the attributes that are automatically configured for alerts in the default alert policies, or default alerts, have a status of Active.

In the tables, default alerts are marked with an asterisk (*).

Tips:
  • The type of storage system determines which attributes and performance conditions are available for alerts. For example, triggering conditions for shares are available only for storage systems that are configured for file storage, such as Storwize® V7000 Unified.
  • For capacity attributes, you can generate alerts when the amount of storage is greater than, less than, or equal to a specified value. You can also determine the unit of measurement for the attribute, such as KiB, MiB, GiB, or TiB.
  • If you are doing tasks where many volumes are being deleted, you might want to temporarily disable alerts that use the Deleted Volume attribute. For example, you might want to disable Deleted Volume alerts temporarily if you are doing maintenance tasks or decommissioning storage.

Performance alert conditions for storage systems

Define alerts that notify you when the performance of a storage system falls outside a specified threshold. In alerts, you can specify conditions based on metrics that measure the performance of volumes, disks, ports, and nodes. By creating alerts with performance conditions, you can be informed about potential bottlenecks in your storage infrastructure.

Examples:
  • You can define an alert to be notified when the average number of I/O operations per second for read and write operations on a storage system's volumes is greater than or equal to a specified threshold. Use this alert to be notified when the workload of a volume is high and you might need to balance that load across other volumes to improve overall performance.
  • You can define an alert to be notified when the percentage of the average response time that can be attributed to delays from host systems is greater than or equal to a specified threshold. Use this alert to be notified of slow hosts and fabrics that might not be working efficiently.
  • You can also define an alert that notifies you when a metric is less than a specified threshold, such as if you want to identify volumes that might be under used.
Tips:
  • The type of storage system determines the metrics that can be alerted upon. For a list of the metrics that are available for each type of storage system, see Performance metrics.
  • A performance monitor must collect data about a resource before IBM Spectrum Control can determine whether a threshold is violated and an alert is generated for a performance condition.
Best practice: When you set thresholds for performance conditions, try to determine the best value so you can derive the maximum benefit without generating too many false alerts. Because suitable thresholds are highly dependent on the type of workload that is being run, hardware configuration, the number of physical disks, exact model numbers, and other factors, there are no easy or standard default rules.

A recommended approach is to monitor the performance of resources for a number of weeks and by using this historical data, determine reasonable threshold values for each performance condition. After that is done, you can fine-tune the condition settings to minimize the number of false alerts.

Capacity alert conditions for storage systems

Capacity metadata is aggregated and collected by probes. By default, this metadata is collected once every 24 hours.

Table 1. Triggering attributes and conditions for capacity changes on storage systems

A list of attributes and conditions for capacity changes on storage systems

Capacity Attributes Triggering Conditions for Attributes

Adjusted Used Capacity

The amount of capacity that can be used without exceeding the capacity limit. For example, you set a capacity limit of 80% for your storage systems. You want to get an informational alert when the adjusted used capacity exceeds 60% and a critical alert when the adjusted used capacity exceeds 80%. So, you define an informational alert with these parameters:
Adjusted Used Capacity ≥ 60%
And, you define a critical alert with these parameters:
Adjusted Used Capacity ≥ 80%

Available Capacity (Previously known as Available Pool Space)

The total amount of the space in the pools that is not allocated to the volumes in the pools. To calculate available capacity, the following formula is used:
(pool capacity - used capacity)

For XIV® systems, pool capacity is the physical capacity of the pools and does not include the provisioned capacity of the pools.

Available Written Capacity (Previously known as Effective Available Capacity)

The total amount of the provisioned capacity in the pools that is not allocated to the volumes in the pools.

Capacity (Previously known as Pool Capacity)

The amount of space in the pools on the storage system that is available for creating volumes.

Capacity-to-Limit

The amount of capacity that is available for storing data before the capacity limit is reached. For example, if you set a capacity limit, you can define a warning alert when the available capacity, in relation to the capacity limit, falls below the value that you specify such as:
Capacity-to-Limit ≤ 500 GiB

Compression Savings

The estimated percentage of capacity that is saved by using data compression, across all pools on the storage system. The percentage is calculated across all compressed volumes in the pools and does not include the capacity of non-compressed volumes. Inline compression is a software feature that is supported by IBM FlashSystem A9000 and IBM FlashSystem A9000R, IBM Storage Accelerate™, XIV storage systems with firmware version 11.6 or later, and resources that run IBM Storage Virtualize™.

Deduplication Savings

The estimated percentage of capacity that is saved by using data deduplication, across all data reduction pools on the storage system. The percentage is calculated across all deduplicated volumes in the pools and does not include the capacity of volumes that are not deduplicated. Available for IBM FlashSystem A9000, IBM FlashSystem A9000R, and resources that run IBM Storage Virtualize 8.1.3 or later.

File System Capacity

The total capacity on all of the file systems on the storage system or filer.

Mapped Capacity (Previously known as Assigned Volume Space)

The total volume space in the storage system that is mapped or assigned to host systems, including child pool capacity.

Overprovisioned Capacity (Previously known as Unallocatable Volume Space)

The capacity that cannot be allocated to volumes because the physical capacity of the pools cannot meet the demands for provisioned capacity.

IBM Spectrum Control uses the following formula to determine this value: Provisioned Capacity − Capacity

Available only for thin-provisioned volumes.

Provisioned Capacity (Previously known as Total Volume Capacity)

The total storage space on all the volumes in pools. For thin-provisioned and compressed volumes, this value includes provisioned capacity. For volumes with parent pools, this value includes child pool capacity.

Raw Capacity (Previously known as Raw Disk Capacity)

The total unformatted disk capacity of a storage system. When this value is calculated, IBM Spectrum Control does not include the capacity of storage system disks that become missing after data collection.

Total Capacity Savings (Previously known as Total Data Reduction Savings) The estimated percentage of capacity that is saved by using data deduplication, data compression, and thin provisioning.

Available for IBM FlashSystem A9000 and IBM FlashSystem A9000R, IBM Storage Accelerate, XIV storage systems with firmware version 11.6 or later, and resources that run IBM Storage Virtualize.

Reserved Capacity (Previously known as Reserved Pool Space)

The amount of unused capacity in the pool that is reserved for provisioning and optimization tasks.

Pool capacity is reserved when a provisioning or optimization task is created, and used when the task is run.

Safeguarded Capacity

The total amount of capacity that is used to store volume backups that are created by the Safeguarded Copy feature in DS8000®.

Shortfall

The difference between the amount of provisioned capacity that is committed to the volumes in the pools and the actual physical space that is available in the pools. As the provisioned capacity is allocated to the thin-provisioned and compressed volumes, the shortfall increases and becomes more critical.

This value is determined by the formula, Overprovisioned Capacity ÷ Committed but Unused Capacity

For example, the physical capacity of the pools is 70 GiB, but 150 GiB of provisioned capacity was committed to the thin-provisioned volumes. If the volumes are using 50 GiB, then there is still 100 GiB committed to those volumes (150 GiB − 50 GiB) with only 20 GiB of available pool capacity (70 GiB − 50 GiB). Because only 20 GiB of the pool capacity is available, 80 GiB of the committed capacity cannot be allocated (100 GiB - 20 GiB).

Snapshot Space

The amount of space that is used by all of the snapshots of the file systems that are associated with the IBM Spectrum Scale cluster.

Unreserved Capacity (Previously known as Unreserved Pool Space)

The amount of space in storage system pools that is not allocated for volumes, and is not reserved by pending or scheduled provisioning tasks.

Unmapped Capacity

The total volume space in the storage system that is not mapped or assigned to host systems.

Unused Volume Capacity (Previously known as Effective Unallocated Volume Space) The amount of the provisioned capacity in the storage pool that is not used.
Used Capacity (Previously known as Physical Allocation)

The percentage of physical capacity in pools that is allocated to volumes, including child pools. The value is always less than or equal to 100% because you cannot allocate more physical capacity to the volumes than is available in the pools. This value is determined by the formula, Used Capacity ÷ Capacity × 100. For example, if the capacity that is reserved for volumes is 50 GiB for a volume size of 200 GiB, used capacity is 25%.

Used Capacity (Previously known as Used Pool Space)

The capacity in the pool that is allocated to and used by volumes.

Used Written Capacity (%) (Previously known as Effective Used Capacity) The percentage of capacity that is provisioned to the standard-provisioned volumes and the thin-provisioned volume, given the drive compression savings.
Used Written Capacity (Previously known as Effective Used Capacity) The total amount of provisioned capacity that is used by all volumes, given the drive compression savings.
Written Capacity Limit (Previously known as Effective Capacity) The amount of provisioned capacity that can be created, given the drive compression savings.

General alert conditions for storage systems

Asset, capacity, and configuration metadata is aggregated and collected when probes collect storage system metadata. By default, metadata is collected once every 24 hours.

Table 2. Triggering attributes and conditions for general changes on storage systems

A list of attributes and conditions for general changes on storage systems

General Attributes Triggering Conditions for Attributes

Firmware

The firmware version of the microcode on a storage system. For the DS-series of storage systems, this value represents the SEA version of the firmware.

To view information about the code bundles for the firmware versions of the DS-series, go to IBM® Support and search for code bundle information. An internet connection is required to access the support site.

Last Successful Probe

Last Successful Monitor

A specified amount of time has passed since a probe or performance monitor was able to collect data about a storage system. You can use this alert to be notified when up-to-date configuration, status, or performance data is not being collected about a storage system and its existing data might be stale. This situation might occur if the resource, network, or IBM Spectrum Control server is unavailable.

Performance Monitor Status*

One of the following statuses is detected for a performance monitor:
Not Normal
An error or warning occurred during data collection. This status indicates that a performance monitor did not collect any data, or only collected a partial set of data about a resource.
Warning
A performance monitor completed, but did not collect a complete set of performance data. This status might occur if the resource was rebooted during data collection, no valid performance data was provided by the resource, or a communication error occurred with the resource or its associated agent.
Error
A performance did not complete when it attempted to collect performance data about the resource. This status might occur if the resource cannot be reached during data collection, or if no configuration data is available for the resource.
For details about why a specific status occurred, check the log for the performance monitor. To check the log, go to the details page for a resource, click Data Collection, and select Actions > Open Logs in the Performance Monitor section on the Data Collection page.

Probe Status*

One of the following statuses is detected for a probe:
Not Successful
An error or warning occurred during data collection. This status indicates that a probe did not collect any data, or only collected a partial set of data about a resource.
Warning
A probe completed, but might not have collected a complete set of data. This status might occur if data cannot be collected about one or more of the internal resources of a resource.
Error (default)
A probe did not complete when it attempted to collect asset data about the resource. This status might occur if the resource cannot be reached during data collection.
For details about why a specific status occurred, check the log for the probe. To check the log, go to the details page for a resource, click Data Collection, and select Actions > Open Logs in the Probe section on the Data Collection page.

Status

One of the following statuses is detected for a storage system:
Not Normal
An error or warning status was detected for a storage system.
Warning
A warning status was detected for a storage system. This status might occur if a storage system comes online or if its version changes.
Error
An error status was detected for a storage system. This status might occur if the cooling fans in a storage system are stopped and the internal temperature is too high or if a storage system goes offline.