Health indicator configuration

A default health monitor configuration is provided during installation. This configuration ensures that the health monitor can evaluate the health of the database environment as soon as the database system is started. However, the health monitor's behavior in evaluating health indicators and reacting to alert states can be fine-tuned through configuration for a specific user's environment.
The health monitor SQL table functions have been deprecated.
Important: The health monitor, health indicators, and related components have been deprecated in Version 9.7 and might be removed in a future release. Health monitor is not supported in Db2® pureScale® environments. For more information, see Health monitor has been deprecated.

There are different levels at which the configuration can be defined. A default configuration of factory settings is provided for each health indicator when the database system is installed. When the health monitor starts for the first time, a copy of the factory settings provides the defaults for the instance and global settings.

Instance settings apply to the instance. Global settings apply to objects such as databases, table spaces, and table space containers in the instance that do not have customized settings defined.

Updating health indicator settings for a specific database, table space, or table space container creates object settings for the updated health indicators. The default for object settings is the global settings.

The health monitor checks the object settings when it processes a health indicator for a particular database, table space, or table space container. If the settings for a particular health indicator have never been updated, the default global settings are used to process the health indicator. The instance settings are used when the health monitor processes a health indicator for the instance.

You can alter health monitor behavior by using a number of attributes that can be configured for each health indicator. The first set of parameters (evaluation flag, thresholds, sensitivity) defines when the health monitor will generate an alert for a health indicator. The second set of parameters (action flag, actions) defines what the health monitor does upon generating the alert.
Evaluation flag
Each health indicator has an evaluation flag to enable or disable evaluation of alert state.
Warning and alarm thresholds
Threshold-based health indicators have settings defining the warning and alarm regions for the health indicator value. These warning and alarm threshold values can be modified for your particular database environment.
Sensitivity parameter
The sensitivity parameter defines the minimum amount of time, in seconds, that the health indicator value has to be in an alert state before the alert is generated. The wait time associated with the sensitivity value starts on the first refresh interval during which the health indicator value enters an alert state. You can use this value to eliminate erroneous alerts generated due to temporary spikes in resource usage.

Consider an example using the Log Utilization (db.log_util) health indicator. Suppose that you review the database notify log on a weekly basis. In the first week, an entry for db.log_util is in alarm state. You recall having received notification for this situation, but on checking for the alert situation from the CLP, the health indicator was back in normal state. After a second week, you notice a second alarm notification entry for the same health indicator at the same time of the week. You investigate activity in your database environment on the two occasions that alerts were generated, and you discover that an application that takes a long time to commit is run weekly. This application causes the log utilization to spike for a short time, approximately eight to nine minutes, until the application commits. You can see from the history entries in the alarm notification record in the notification log, that the db.log_util health indicator is evaluated every 10 minutes. Because the alert is being generated, the application time must be spanning that refresh interval. You set the sensitivity for the db.log_util parameter to ten minutes. Now every time the value of db.log_util first enters the warning or alarm threshold regions, the value must remain in that region for at least ten minutes before the alert is generated. No further notification entries are recorded in the notification log for this situation because the application only lasts eight to nine minutes.

Action flag
The running of actions on alert generation is controlled by the action flag. Only when the action flag is enabled are the configured alert actions run.
Script or task actions can be configured to run on alert occurrence. For threshold-based health indicators, actions can be configured to run on warning or alarm thresholds. For state-based health indicators, actions can be configured to run on any of the possible non-normal conditions. The database administration server must be running for actions to run.
The following input parameters are passed to every operating system command script:
  • <health indicator shortname>
  • <object name>
  • <value | state>
  • <alert type>

Script actions use the default interpreter on the operating system. If you want to use a non-default interpreter, create a task using the ADMIN_TASK_ADD procedure with the script content. In a multipartitioned environment, the script defined in the script action must be accessible by all partitions.

The refresh interval at which the health monitor checks each health indicator cannot be configured. The recommendation actions considered by the health monitor cannot be configured.

The health monitor configuration is stored in a binary file, HealthRules.reg:
  • On Windows, HealthRules.reg is stored in x:\<SQLLIB_PATH>\<INSTANCE_NAME>. For example, d:\sqllib\DB2.
  • On UNIX, HealthRules.reg is stored in ~/<SQLLIB_PATH>/cfg. For example, ~/home/sqllib/cfg.
It is possible to replicate a health monitor configuration to other database instances on a Linux®, UNIX, or Windows server. You can accomplish this replication by copying the binary configuration file to the appropriate directory location on the target instance.