Defining general group alerts for performance metrics

You can define alerts that are triggered when the performance of the storage systems or switches that belong to a general group fall outside a specified threshold.

About this task

To define a general group alert for performance metrics, select the metric that you want to measure and specify a threshold value. When the performance of that resource falls outside the threshold, an alert is generated. For example, you can define an alert that notifies you when the back-end response times for managed disks on a SAN Volume Controller exceed 35 milliseconds per read operation. The Overall Back-end Response Time is a metric that measures the average number of milliseconds that it takes to service each read operation on a managed disk.

Procedure

To define a general group alert for performance metrics, complete these steps:

  1. In the menu bar, select Groups > General Groups.
  2. Right-click a general group in the list and click View Alert Definitions.
  3. Click Edit Alert Definitions.
  4. Click the type of resource that you want to alert on.
  5. Click Performance and then click Add Metrics.
  6. Select one or more metrics to alert on and click OK.
  7. To enable the alert for a performance metric, click the check mark for the metric.
  8. Specify the conditions for generating an alert.
    Conditions include an operator and a threshold value.
    1. Select an operator.
      An operator determines if an alert is triggered when the performance of a resource is greater than or equal to or less than or equal to the specified threshold value.
    2. Enter a threshold value.
      For example, to trigger an alert if the Total I/O Rate for a storage system is greater than or equal to 500 ops/s, enter the value 500.
      Tips for threshold values:
      • IBM Spectrum Control provides recommended values for threshold values that do not vary much between environments. For example, the default threshold values for Port Send Bandwidth Percentage are greater than or equal to 75% for warning alerts, and greater than or equal to 85% for critical alerts.

        However, for metrics that measure throughput and response times, thresholds can vary because of workload, model of hardware, amount of cache memory, and other factors. In these cases, there are no recommended values. To help determine threshold values for a resource, collect performance data over time to establish a baseline of the normal and expected performance behavior for that resource. After you determine a set of baseline values, define alerts to trigger if the measured performance behavior falls outside the normally expected range.

      • For some metrics, lower values might indicate more stress and higher values might indicate idle behavior. For example, a lower threshold value for the Cache Holding Time Threshold metric might indicate a performance problem.
  9. Optional: Click View Performance to view a chart of the performance of the resource. Use the chart to evaluate the current and historical performance of a resource to help determine the threshold value for an alert.
    The chart displays a horizontal color line at the specified threshold value. The color of the line indicates the severity of the alert:
    • Critical alert: red
    • Warning alert: yellow
    • Information alert: blue

    For multi-conditional alerts, the chart displays a horizontal line for each condition that shows the threshold value and severity.

    To customize the chart, click Top 10 or Bottom 10 to show resources according to their performance, click a time period, and change the start and end dates for the data that is displayed.

  10. Assign a severity to an alert.
    Assigning a severity can help you more quickly identify and address the critical performance conditions that are detected on resources. The severity that you assign depends on the guidelines and procedures within your organization. Default assignments are provided for each alert.
    Option Description
    Critical alert icon
    Critical
    Assign this severity to alerts that are critical and must be resolved. For example, assign a critical severity to alerts that notify you when the Port Send Bandwidth Percentage is greater than or equal to 85%.
    Warning alert icon
    Warning
    Assign this severity to alerts that are not critical, but represent potential problems. For example, assign a warning severity to alerts that notify you when the Port Send Bandwidth Percentage is greater than or equal to 75% but less than 85%.
    Informational alert icon
    Informational
    Assign this severity to alerts that might not require any action to resolve and are primarily for informational purposes.
  11. Optional: If you want to send email notifications of alert violations to contacts other than the policy contacts or global alert notification addresses, enter the email addresses in the Email Override field.
    Tip: If you enter an email address in the Email Override field, only that email address receives notifications for the alert. The following contacts do not receive notifications:
    • Any email addresses that are specified as policy contacts, if the alert is in an alert policy.
    • Any global email addresses that are specified for alert notifications. To view the global alert notification addresses, go to Settings > Notification Settings.
  12. Optional: Click View Additional Options to specify how frequently you are notified of alerts.
    Use these settings to avoid triggering too many alerts for some conditions.
  13. Optional: Click the icon Duplicate alert icon to duplicate an alert.
    Use this action when you want to define another alert for the same metric but with different criteria and settings.
    Duplicating alerts can be helpful in the following situations:
    • When you want to generate separate warning alerts and critical alerts for different thresholds on the same metric.
      For example, for the CRC Error Rate metric for ports, you might want to define the following alerts:
      • Define a warning alert Warning alert icon to be generated when the number of frames per second that are received with cyclic redundancy check (CRC) errors is greater than or equal to 0.01 counts per second.
      • Duplicate the alert, but this time, specify a critical severity Critical alert icon when the CRC error rate is greater than or equal to 0.03 counts per second.
    • When you want to send alert notifications to different people based on the severity of an alert.

      In the previous example for the CRC Error Rate metric, you can configure the notification settings so that warning alerts are sent to junior administrators, while critical alerts are sent to more senior administrators to resolve.

  14. Click Save Changes.