Defining alert definitions for performance changes

You can define alerts that are triggered when the performance of a resource falls outside a specified threshold.

Procedure

  1. To define alerts for resources, choose one of the following options:
    Option Steps
    Define alerts for a policy
    1. Go to Settings > Alert Policies.
    2. Double-click the policy.
    3. Click Edit Alert Definitions on the Alert Definitions tab.
    Define alerts for a resource that is not managed by a policy
    1. Go to the resource list page for the resource. For example, to define alerts for a block storage system, go to Storage > Block Storage Systems. To define alerts for a switch, go to Network > Switches.
    2. Right-click the resource for which you want to define alerts, then click View Alert Definitions.
    3. Click Edit Alert Definitions.
  2. Click the type of resource that you want to alert on. For example, click Storage System.
  3. Click the Performance category.
  4. To enable the alert for a performance metric, click the check mark for the metric. If the metric that you want is not displayed, click Add Metrics then select the metric you want.
  5. Specify the conditions for generating an alert.
    Conditions include an operator and a threshold value.
    1. Select an operator.
      An operator determines whether an alert is triggered when the performance of a resource is greater than or equal to or less than or equal to the specified threshold value.
    2. Enter a threshold value.
      For example, to trigger an alert if the Total I/O Rate for a storage system is greater than or equal to 500 ops/s, enter the value 500.
      Tips for threshold values:
      • IBM Spectrum® Control provides recommended values for threshold values that do not vary much between environments. For example, the default threshold values for Port Send Bandwidth Percentage are greater than or equal to 75% for warning alerts, and greater than or equal to 85% for critical alerts.

        However, for metrics that measure throughput and response times, thresholds can vary because of workload, model of hardware, amount of cache memory, and other factors. In these cases, there are no recommended values. To help determine threshold values for a resource, collect performance data over time to establish a baseline of the normal and expected performance behavior for that resource. After you determine a set of baseline values, define alerts to trigger if the measured performance behavior falls outside the normally expected range.

      • For some metrics, lower values might indicate more stress and higher values might indicate idle behavior. For example, a lower threshold value for the Cache Holding Time Threshold metric might indicate a performance problem.
  6. Optional: Click View Performance to view a chart of the performance of the resource. Use the chart to evaluate the current and historical performance of a resource to help determine the threshold value for an alert.
    The chart displays a horizontal color line at the specified threshold value. The color of the line indicates the severity of the alert:
    • Critical alert: red
    • Warning alert: yellow
    • Information alert: blue

    For multi-conditional alerts, the chart displays a horizontal line for each condition that shows the threshold value and severity.

    To customize the chart, click Top 10 or Bottom 10 to show resources according to their performance, click a time period, and change the start and end dates for the data that is displayed.

  7. Assign a severity to an alert.
    Assigning a severity can help you more quick to identify and address the critical conditions that are detected on resources. The severity that you assign depends on the guidelines and procedures within your organization. Default assignments are provided for each alert.
    Option Description
    Critical alert icon
    Critical
    Assign this severity to alerts that are critical and need to be resolved. For example, assign a critical severity to alerts that notify you when the Port Send Bandwidth Percentage is greater than or equal to 85%.
    Warning alert icon
    Warning
    Assign this severity to alerts that are not critical, but represent potential problems. For example, assign a warning severity to alerts that notify you when the Port Send Bandwidth Percentage is greater than or equal to 75% but less than 85%.
    Informational alert icon
    Informational
    Assign this severity to alerts that might not require any action to resolve and are primarily for informational purposes.
  8. Optional: If you want to send email notifications of alert violations to contacts other than the policy contacts or global alert notification addresses, enter the email addresses in the Email Override field.
    Tip: If you enter an email address in the Email Override field, only that email address receives notifications for the alert. The following contacts do not receive notifications:
    • Any email addresses that are specified as policy contacts, if the alert is in an alert policy.
    • Any global email addresses that are specified for alert notifications. To view the global alert notification addresses, go to Settings > Notification Settings.
  9. Optional: Click View Additional Options to specify how frequently you are notified of alerts.
    Use these settings to avoid triggering too many alerts for some conditions.
  10. Optional: Click View Additional Options to specify that the following actions are taken when alert conditions are detected on monitored resources:
    Run script
    Run a script when an alert is triggered for the condition. Use a script to call external programs or run commands that take action as the result of an alert. By using a script, you can automatically address potential storage issues when they are detected to avoid unplanned downtime or performance bottlenecks. Learn more.
    Netcool® / OMNIbus
    Send alert notifications to a Netcool server or OMNIbus EIF probe server within your environment that was configured to receive IBM Spectrum Control alerts.
    SNMP
    Generate SNMP trap messages to any network management station (NMS), console, or terminal when an alert condition is detected. System administrators must set up their SNMP trap ringer with the provided management information base (MIB) files to receive SNMP traps from the product.
    Windows event log or UNIX syslog
    Write alert messages to the OS log. If you already have an administrator monitoring OS logs, this method is a way to centralize your priority messages for quick notification and viewing.
  11. Optional: Duplicate alert icon Duplicate an alert.
    Use this action when you want to define another alert for the same metric but with different conditions and settings.
    Duplicating alerts can be helpful in the following situations:
    • When you want to generate separate warning alerts and critical alerts for different thresholds on the same metric.
      For example, for the CRC Error Rate metric for ports, you might want to define the following alerts:
      • Define a warning alert Warning alert icon to be generated when the number of frames per second that are received with cyclic redundancy check (CRC) errors is greater than or equal to 0.01 counts per second.
      • Duplicate the alert, but this time, specify a critical severity Critical alert icon when the CRC error rate is greater than or equal to 0.03 counts per second.
    • When you want to send alert notifications to different people based on the severity of an alert.

      In the previous example for the CRC Error Rate metric, you can configure the notification settings so that warning alerts are sent to junior administrators, while critical alerts are sent to more senior administrators to resolve.

  12. Click Save Changes.

Results

To view all the alerts generated by IBM Spectrum Control, go to Home > Alerts in the GUI.

Tip: If a performance monitor is already collecting data about a resource when you add, modify, or remove a performance alert for that resource, changes are applied dynamically. You do not have to stop and restart the performance monitor to apply the changes. A confirmation message is recorded in the log of the performance monitor when the alert is updated.