Scenarios for custom alerts

Create custom alerts with one or more conditions that trigger when all the conditions are met for a given resource. By creating a custom alert, you can detect multiple configuration, capacity, and performance conditions together to determine whether an urgent situation occurred in your storage or SAN fabric.

The requirements of your environment determine the custom alerts that you create. For example, your storage systems might run critical production applications on tier 1 storage. In this case, you don't want the performance of the tier 1 storage to fall below a certain threshold. To be notified when that situation occurs, you can create a custom alert that checks if the overall response time of tier 1 storage is too high.

Tip: You can define alerts for storage systems, hosts, fabrics, and switches and their internal resources. However, the resources in an alert must be along the same data path. For example, if you create an alert for a storage system volume, resources along the same data path include the pool of which that volume is part and the host that maps the volume. Switch ports that are used to access that volume are also in the same data path.

Use the following example scenarios as guides to help you create custom alerts for your environment.

Receive alerts for Sustainability values

To monitor the power and temperature Service Level Agreements (SLAs) of your storage system, you want to get notified when the power of your storage system exceeds a certain defined value or SLA.

Solution: Define a custom alert by specifying the Total Power Consumed greater than or equal to (>=) a value (for example, 70) along with the level of Severity.; When the user selects the check box, the alert severity contributes to the health status of the storage system.
Note: The health status of the storage system is displayed only in the Carbon dashboard and not in the Classic IBM Storage Insights view.

Receive alerts when the response time of storage on a specific tier is too high

Your storage systems run both critical production applications and noncritical test applications. The production applications use tier 1 storage, while the test applications use storage on tiers 2 and 3.

To ensure consistent, top performance for tier 1 storage, you want to be notified when its response time is higher than 6 ms/op so that you can resolve the bottleneck. However, to avoid too many alerts, you do not want to receive notifications when the response time of tier 2 or 3 storage exceeds 6 ms/opop.

Solution

Define a custom alert that checks if the volumes used by an application are in Tier 1 pools and if their Overall Response Time is higher than 6 ms/.

For a storage system, set up a custom alert with the following attributes and conditions:

Alert definition for scenario about high response times on tier storage

Attribute	Condition
Overall Response time (Custom > Volumes > Performance)	Greater than or equal to (>=) 6 ms/op
Tier (Custom > Pools > General)	Tier 1

Receive alerts when the response time of volumes is too high during times of active I/O

You care about high read response times on your volumes, but they can be caused by cache misses when there is only a trickle of I/O.

Solution

Define a custom alert with volume-level thresholds that combines checks for response times and I/O.

Alert definition for scenario about high response times on volumes during times of active I/O

Attribute	Condition
Overall Response time (Custom > Volumes > Performance)	Greater than or equal to (>=) 10 ms/op
Total I/O Rate - overall (Custom > Volumes > Performance)	Greater than or equal to (>=) 50 ms/op

Receive alerts when the response time of volumes is too high, but do not generate these alerts when batch and backup jobs are running

To ensure the consistent, top performance of your volumes, you want to be notified when their response times are becoming too high. However, to avoid too many alerts, you do not want to receive notifications when batch and backup jobs are running on your storage. You understand that these jobs can cause an expected spike in response times and do not require action on your part.

Solution

Define a custom alert that checks if the Read Response Time of volumes exceeds an amount that is more than expected in your environment and the Read Transfer Size is less than 256 KiB/op. Typically, read transfer sizes greater than 256KiB/op indicate that batch or backup jobs are running in the background.

For a storage system, set up a custom alert with the following attributes and conditions:

Alert definition for scenario about slow performance of volumes

Attribute	Condition
Read Response Time (Custom > Volumes > Performance)	Greater than or equal to (>=) 20 ms/op
Read Transfer Size (Custom > Volumes > Performance)	Less than or equal to (<=) 256 KiB/op

Receive alerts if a port is being used for both inter-node communication and host I/O exchanges

You want to avoid potential bottlenecks by ensuring that storage system ports aren't being used for both inter-node communication in the local cluster and for I/O exchanges to host computers. You can also use this custom alert to check for adherence to best practices that are related to configuring ports for nodes with 8 or more ports. It does not apply to nodes that contain only 4 ports.

Solution

Define a custom alert that checks if the I/O rate for ports indicates exchanges between local nodes and hosts. For a storage system, set up a custom alert with the following attributes and conditions:

Alert definition for scenario about dual use ports

Attribute	Condition
Total Port-to-Host I/O Rate (Custom > Ports > Performance)	Greater than or equal to (>=) .01 ops/s
Total Port-to-Local Node I/O Rate (Custom > Ports > Performance)	Greater than or equal to (>=) .01 ops/s

Tip: Optionally, you can define other custom alerts to be notified of this situation, depending on your storage requirements. For example:

Attribute	Condition
Total Port-to-Disk I/O Rate (Custom > Ports > Performance)	Greater than or equal to (>=) .01 ops/s
Total Port-to-Local Node I/O Rate (Custom > Ports > Performance)	Greater than or equal to (>=) .01 ops/s
Total Port-to-Remote I/O Rate (Custom > Ports > Performance)	Greater than or equal to (>=) .01 ops/s

Receive alerts for link resets that are not associated with link initialization

You want to identify link resets that are generated in response to hardware failures or link congestion. Link Resets generated by link initialization are ignored.

Solution

Define a custom alert that checks if link resets occur and if those resets are not associated with a link initialization. For a switch, set up a custom alert with the following attributes and conditions:

Alert definition for scenario about link resets that are not associated with link initialization

Attribute	Condition
Link Reset Received Rate (Custom > Ports > Performance) OR Link Reset Transmitted Rate (Custom > Ports > Performance)	Greater than or equal to (>=) .01 cnt/s
Sync Loss (Custom > Ports > Performance)	Less than or equal to (<=) 0 cnt/s
Signal Loss (Custom > Ports > Performance)	Less than or equal to (<=) 0 cnt/s

Attribute

Condition

Link Reset Received Rate (Custom > Ports > Performance)

Link Reset Transmitted Rate (Custom > Ports > Performance)

Greater than or equal to (>=) .01 cnt/s

Sync Loss (Custom > Ports > Performance)

Less than or equal to (<=) 0 cnt/s

Signal Loss (Custom > Ports > Performance)

Less than or equal to (<=) 0 cnt/s

Receive alerts for invalid word transmissions that are not associated with link initialization

You want to identify invalid transmission words that are generated because of poor link quality. Poor or marginal link quality can be caused by a bad SFP, HBA, or cable. Invalid transmission words that are generated by link initialization are ignored.

Solution

Define a custom alert that checks if invalid word transmissions occur and if those resets are not associated with a link initialization. For a switch, set up a custom alert with the following attributes and conditions:

Alert definition for scenario about invalid word transmissions that are not associated with link initialization

Attribute	Condition
Invalid Transmission Words (Custom > Ports > Performance)	Greater than or equal to (>=) .01 cnt/s
Sync Loss (Custom > Ports > Performance)	Less than or equal to (<=) 0 cnt/s
Signal Loss (Custom > Ports > Performance)	Less than or equal to (<=) 0 cnt/s