Triggering conditions for switch alerts

You can set up IBM Spectrum Control so that it examines the attributes and performance of a switch and notifies you when changes or violations are detected.

Alerts can notify you of general changes and performance issues on the following resources:
Important: Not all the attributes upon which you can alert are listed here. To view a complete list of attributes upon which you can alert, you can either edit alert definitions for a switch with no alert policy assigned, or you can create a new custom alert policy and define new alerts in the policy. To create a custom alert policy go to Configuration > Alert Policies and click Create Policy.

Attributes that are automatically configured for alerts in the default alert policies, or default alerts, have a status of Active. In the following information, default alerts are marked with an asterisk (*).

Switches (general changes)

Table 1. Pre-defined alerts for Switches
General Attributes Defining Conditions for Attributes

Last Successful Probe

Last Successful Monitor

A specified amount of time has passed since a probe or performance monitor was able to collect data about a switch. You can use this alert to be notified when up-to-date configuration, status, or performance data is not being collected about a switch and its existing data might be stale. This situation might occur if the resource, network, or IBM Spectrum Control server is unavailable.

Probe Status*

One of the following statuses is detected for a probe:
Not Successful
An error or warning occurred during data collection. This status indicates that a probe did not collect any data, or only collected a partial set of data about a resource.
Warning
A probe completed, but might not have collected a complete set of data. This status might occur if data cannot be collected about one or more of the internal resources of a resource.
Error (default)
A probe did not complete when it attempted to collect asset data about the resource. This status might occur if the resource cannot be reached during data collection.
For details about why a specific status occurred, check the log for the probe. To check the log, go to the details page for a resource, click Data Collection, and select Actions > Open Logs in the Probe section on the Data Collection page.

Performance Monitor Status*

One of the following statuses is detected for a performance monitor:
Not Normal
An error or warning occurred during data collection. This status indicates that a performance monitor did not collect any data, or only collected a partial set of data about a resource.
Warning
A performance monitor completed, but did not collect a complete set of performance data. This status might occur if the resource was rebooted during data collection, no valid performance data was provided by the resource, or a communication error occurred with the resource or its associated agent.
Error
A performance did not complete when it attempted to collect performance data about the resource. This status might occur if the resource cannot be reached during data collection, or if no configuration data is available for the resource.
For details about why a specific status occurred, check the log for the performance monitor. To check the log, go to the details page for a resource, click Data Collection, and select Actions > Open Logs in the Performance Monitor section on the Data Collection page.
Status*
One of the following conditions is detected on a switch:
Not Normal
An error or warning status was detected on the switch or its internal resources.
Warning
A warning status was detected on the switch or its internal resources.
Error (default)
An error status was detected on the switch or its internal resources. For example, an error status might occur when a switch goes offline.
Unreachable
One or more of the monitored resources for a switch are not responding. This status might be caused by a problem in the network.

Switches (performance)

Define alerts that notify you when the performance of a switch falls outside a specified threshold. In alerts, you can specify conditions based on metrics that measure the performance of switch ports, including I/O, data, and error rates, and frame transfer sizes. By creating alerts with performance conditions, you can be informed about potential bottlenecks in your network infrastructure.

For example, you can define an alert to be notified when the port congestion index for a port is greater than or equal to a specified threshold. Port congestion represents the estimated degree to which frame transmission was delayed due to a lack of buffer credits. Use this alert to help identify port conditions that might slow the performance of the resources to which those ports are connected.

You can also be notified when a metric is less than a specified threshold, such as when you want to identify ports that might be under used.

For a complete list of switch metrics that can be alerted upon, see Performance metrics for switches.
Tips for performance conditions:
  • A performance monitor must collect data about a resource before IBM Spectrum Control can determine whether a threshold is violated and an alert is generated for a performance condition.
  • When you define performance alerts for inter-switch connections, the performance for all the inter-switch links (ISLs) in the trunk are aggregated and compared to the threshold value. To measure and alert on performance, you must collect performance metadata about the ISLs for both switches involved in the connection. To determine if a single ISL exceeds a threshold, define an alert for port performance.

    You can alert only on the performance of ISL connection types. Performance metadata is not collected for the other types of inter-switch connections, such as ISL Trunks.

Best practice: When you set thresholds for performance conditions, try to determine the best value so you can derive the maximum benefit without generating too many false alerts. Because suitable thresholds are highly dependent on the type of workload that is being run, hardware configuration, the number of physical disks, exact model numbers, and other factors, there are no easy or standard default rules.

A recommended approach is to monitor the performance of resources for a number of weeks and by using this historical data, determine reasonable threshold values for each performance condition. After that is done, you can fine-tune the condition settings to minimize the number of false alerts.

Click Edit alert Definitions, and for each performance alert definition click View History to see the history of switch, port or trunk performance and set the threshold you want relative to that data.

Inter-Switch Connections (performance)

Define alerts that notify you when the performance of ISLs falls outside a specified threshold. In alerts, you can specify conditions based on metrics that measure the performance of the ISL, including I/O, data, and error rates, and frame transfer sizes. By creating alerts with performance conditions, you can be informed about potential bottlenecks in your network infrastructure.

For example, you can define an alert to be notified when the aggregate port congestion index of the ports in the ISL is greater than or equal to a specified threshold. Port congestion represents the estimated degree to which frame transmission was delayed due to a lack of buffer credits. Use this alert to help identify port conditions that might slow the performance of the resources to which those ports are connected. You can also be notified when a metric is less than a specified threshold, such as when you want to identify ISLs that might be under used.

Tip: You can alert only on the performance of ISL connections. Performance metadata is not collected for the other types of inter-switch connections.

For a complete list of inter-switch connection metrics that can be alerted upon, see Performance metrics for switches

Ports (general changes)

Table 2. Pre-defined alerts for ports
General Attributes Defining Conditions for Attributes

Removed Port

A previously monitored port can no longer be found. Historical data about the port is retained, but no current data is being collected. Use this alert to be notified if a port is removed or becomes unavailable.

State

A port is online, enabled but offline, or disabled.

Status*

One of the following statuses is detected for a port:
Not Normal
An error or warning status is detected on a port.
Warning
A warning status is detected on a port. This status might occur if the switch is stopped, starting, or in service (being maintained, cleaned, or administered).
Error
An error status is detected on a port.

Ports (performance)

Define alerts that notify you when the performance of a port falls outside a specified threshold. In alerts, you can specify conditions based on metrics that measure the performance of switch ports, including I/O, data, and error rates, and frame transfer sizes. By creating alerts with performance conditions, you can be informed about potential bottlenecks in your network infrastructure.

For example, you can define an alert to be notified when the port congestion index for a port is greater than or equal to a specified threshold. Port congestion represents the estimated degree to which frame transmission was delayed due to a lack of buffer credits. Use this alert to help identify port conditions that might slow the performance of the resources to which those ports are connected.

You can also be notified when a metric is less than a specified threshold, such as when you want to identify ports that might be under used.

For a complete list of port metrics that can be alerted upon, see Performance metrics for switches