Threshold monitoring for system health

Threshold monitoring is a service that helps the user to identify performance issues by defining thresholds rules for wanted performance metrics. For more information on performance matrix, see the Configuring the performance monitoring tool and List of performance metrics sections.

Start of change

You can configure IBM Spectrum Scale to raise events when certain thresholds are reached. As soon as one of the metric values exceeds a threshold limit, the system health daemon receives an event notification from the monitoring process. Then, the health daemon generates a log event and updates the health status of the corresponding component.

You can set the following two types of threshold levels for the data that is collected through performance monitoring sensors:
Warning level
When the data that is being monitored reaches the warning level, the system raises an event with severity Warning. When the observed value exceeds the current threshold level, the system removes the warning.
Error level
When the data that is being monitored reaches the error level, the system raises an event with severity Error. When the observed value exceeds the current threshold level, the system removes the error state.
End of change