Defining application alerts for performance metrics

You can define alerts that are triggered when the performance of the volumes that belong to an application fall outside a specified threshold.

About this task

To define an application alert for performance metrics, select the metric that you want to measure for the volumes in an application and specify a threshold value. When the performance of a volume falls outside the threshold, an alert is generated. For example, by using the Read Response Time metric you can define an alert that notifies you when the response time for a volume exceeds 20 milliseconds per read operation.

Procedure

To define an application alert for performance metrics, complete these steps:

  1. In the menu bar, select Groups > Applications.
  2. Right-click an application in the list and click View Alert Definitions.
  3. Click Edit Alert Definitions.
  4. Click Volumes or Applications, and then click Performance.
    If you set performance metrics under the Applications category, the metrics for all components of the application are aggregated and used for alerting.
  5. Click Add Metrics.
  6. Select one or more metrics to alert on and click OK.
  7. To enable the alert for a performance metric, click the check mark for the metric.
  8. Specify the conditions for generating an alert.
    Conditions include an operator and a threshold value.
    1. Select an operator.
      An operator determines if an alert is triggered when the performance of a resource is greater than or equal to or less than or equal to the specified threshold value.
    2. Enter a threshold value.
      For example, to trigger an alert if the Total I/O Rate for a storage system is greater than or equal to 500 ops/s, enter the value 500.
      Tips for threshold values:
      • IBM Storage Insights Pro provides recommended values for threshold values that do not vary much between environments. For example, the default threshold values for Port Send Bandwidth Percentage are greater than or equal to 75% for warning alerts, and greater than or equal to 85% for critical alerts.

        However, for metrics that measure throughput and response times, thresholds can vary because of workload, model of hardware, amount of cache memory, and other factors. In these cases, there are no recommended values. To help determine threshold values for a resource, collect performance data over time to establish a baseline of the normal and expected performance behavior for that resource. After you determine a set of baseline values, define alerts to trigger if the measured performance behavior falls outside the normally expected range.

      • For some metrics, lower values might indicate more stress and higher values might indicate idle behavior. For example, a lower threshold value for the Cache Holding Time Threshold metric might indicate a performance problem.
  9. Assign a severity to an alert.
    Assigning a severity can help you more quickly identify and address the critical performance conditions that are detected on volumes. The severity that you assign depends on the guidelines and procedures within your organization. Default assignments are provided for each alert.
    Option Description
    Critical alert icon
    Critical
    Assign this severity to alerts that are critical and must be resolved. For example, you might assign a critical severity to alerts that notify you when the average Read Response Time for volumes is greater or equal to 20 ms/op.
    Warning alert icon
    Warning
    Assign this severity to alerts that are not critical, but represent potential problems. For example, you might assign a warning severity to alerts that notify you when the average Read Response Time for volumes is greater than 10 ms/op but less than 20 ms/op.
    Informational alert icon
    Informational
    Assign this severity to alerts that might not require any action to resolve and are primarily for informational purposes.
  10. Optional: Click View History to view a chart of the performance of the resource. Use the chart to evaluate the current and historical performance of a resource to help determine the threshold value for an alert.
    The chart uses colored lines to represent the different threshold values and severities that can be defined for an alert:
    • Critical alert: red
    • Warning alert: orange
    • Information alert: blue
    To customize the chart, click Top 10 or Bottom 10 to show resources according to their performance, click a time period, and change the start and end dates for the data that is displayed.
  11. Optional: If you want to send email notifications of alert violations to contacts other than the policy contacts or global alert notification addresses, enter the email addresses in the Email Override field.
    Tip: If you enter an email address in the Email Override field, only that email address receives notifications for the alert. The following contacts do not receive notifications:
    • Any email addresses that are specified as policy contacts, if the alert is in an alert policy.
    • Any global email addresses that are specified for alert notifications. To view the global alert notification addresses, go to Configuration > Settings.
  12. Optional: Click View Additional Options to specify how frequently you are notified of alerts.
    Use these settings to avoid triggering too many alerts for some conditions.
  13. Optional: Click the icon Duplicate alert icon to duplicate an alert.
    Use this action when you want to define another alert for the same metric but with different criteria and settings.
    Duplicating alerts can be helpful in the following situations:
    • When you want to generate separate warning alerts and critical alerts for different thresholds on the same metric.
      For example, for the Read Response Time metric for volumes, you might want to define the following alerts:
      • Define a warning alert Warning alert icon to be generated when the average time that it takes to service each read operation is greater than or equal to 10 milliseconds.
      • Duplicate the alert, but this time, specify a critical severity Critical alert icon when the average time is greater than or equal to 20 milliseconds.
    • When you want to send alert notifications to different people based on the severity of an alert.

      In the previous example for the Read Response Time metric, you can configure the notification settings so that warning alerts are sent to junior administrators, while critical alerts are sent to more senior administrators to resolve.

  14. Click Save Changes.