Alerts and alert policies

Specify conditions that trigger alerts and the actions to take when those alerts are triggered, such as notify an email address. Use alert policies to define those alert conditions and notification settings for a group of resources.

Alerting functions examine the attributes, capacity, and performance of resources. If the conditions that are defined for alerts are met, the actions that are specified for the alert are taken. Typically, the actions include sending a notification. For example, if the status of a SAN Volume Controller storage system changes to Error, an alert is displayed in the Alerts page in the GUI, and an email might be sent to a storage administrator.

You can manage alerts in your storage environment in the following ways:

  • Use alert policies to manage the alert definitions and notification settings that apply to different sets of resources. For example, you can use one alert policy for the storage systems in your test environment, and another for the storage systems in your production environment.

    Alert policies manage one type of resource only. For example, if you have SAN Volume Controller and IBM Storage FlashSystem 900 storage systems in your storage environment, you cannot have both types of resource in one alert policy.

    A resource can be managed by only one alert policy. When you add a resource to be monitored by IBM Spectrum Control, it is added to a default alert policy automatically.

    If a resource is managed by a policy, the resource cannot have alert definitions and notification settings that are independent of the policy. The alert definitions and notification settings that apply to the resource come from the policy.

    Default policies with alerts already configured are available. You can create copies of the default policies and assign resources to the new policies. Your alerts are configured with the default settings.

  • Define alert conditions and notification settings for individual resources. It is not a requirement for resources to be managed by an alert policy. A resource can have its own alert definitions and notification settings, independent of an alert policy.
  • Define alerts and notification settings for applications and general groups. Use applications or general groups to manage alerts for groups of resource components such as volumes or pools. For example, you might want to define alerts on the response time for volumes in an application, depending on the response time requirements of the application. In this case, it is not useful to configure volume response time thresholds for the entire storage system because the storage system might serve many different applications with different needs.

Triggering conditions for alerts

The conditions that trigger alert notifications depend on the type of resource that you are monitoring. In general, the following types of conditions can trigger alerts:
  • An attribute or configuration of a resource changed
  • The capacity of a resource fell outside a specified range
  • The performance of a resource fell outside a specified range
  • The storage infrastructure was changed, such as a new or removed resource
  • Data is not being collected for a resource

For example, you can use performance thresholds to be notified when the total I/O rate for storage systems falls outside a specified range. This information can help you identify areas in your storage infrastructure that are over used or under used. IBM Spectrum Control provides many metrics for measuring performance and determining violations of the thresholds that you specify.

Alert notifications and triggered actions

When an event occurs and triggers an alert, the alert is written to a log. You can also select one or more other ways to be notified of the event. These alert notifications include SNMP traps, IBM Tivoli Netcool/OMNIbus events, login notifications, entries in Windows event log or UNIX syslog, and emails. Additionally, if a Storage Resource agent is deployed on a monitored server, you can run a script or start an IBM Storage Protect job in response to the alert.

Acknowledging alerts: Some alerts are triggered by conditions that commonly occur and can be ignored. In such cases, you acknowledge these alerts to indicate that they were reviewed and do not require immediate resolution. By acknowledging alerts, you can more quickly identify other alerts that must be reviewed and resolved.

Event processing

Alerts are generated when particular conditions are detected during data collection and event processing. For some storage systems such as IBM Storage Accelerate and the XIV, events are polled every minute from the resource.

For other resources, events are subscription-based, where the resource itself or a data source such as an SMI-S provider (also called CIM agent or CIMOM) sends the events to IBM Spectrum Control when conditions change on the resource. Examples of storage systems that use subscription-based event processing include SAN Volume Controller, Storwize V7000, Storwize V7000 Unified, and IBM Storage FlashSystem V9000. For these storage systems, a probe is automatically run when many events are received from the storage system in a short time period. To avoid performance bottlenecks, probes are run only every 20 minutes.

Prerequisites for using alerts

The following conditions must be met to successfully use alerts:
  • Data collection schedules are configured and scheduled to run regularly. For example, to detect violations of performance thresholds, you must run performance monitors to collect performance data about resources. Running performance monitors regularly also helps to establish a history of performance for trending analysis.
  • If you want to be notified about an alert in some way other than an entry in the log file, such as using SNMP traps, IBM Tivoli Netcool/OMNIbus events, or email, you must configure those alert destinations before you use the alert.
  • If an alert is triggered based on an SNMP trap from the monitored resource, you must properly configure the SNMP server of the monitor resource to enable IBM Spectrum Control to listen to SNMP traps. The default port number is 162, and the default community is public.