Monitoring Db2 Mirror resources
The health center is a feature of Db2® Mirror that is used to continuously monitor for issues in the Db2 Mirror environment and act when an issue is detected.
The health of the Db2 Mirror environment is determined by monitoring two classes of resources; critical and product-related resources. Critical resources, such as storage and communication, are checked more frequently than product-related resources. When an issue is detected, the health center either suspends replication or notifies the operator of the issue by sending an alert to the QSYSOPR message queue. A user can configure the interval that is used to determine how frequently the Db2 Mirror environment checks these resources in addition to other health center properties.
Health Center Controls
SELECT * FROM QSYS2.MIRROR_HEALTH_MONITOR_INFO;
Resource check intervals
Two interval properties control how frequently the health center monitors the Db2 Mirror environment. These properties are used to specify the time interval between critical and product-related resource checks made by the health center. Changing these interval properties affects both nodes.
Critical resource checks
- Checking the status of the RDMA communication links between the nodes.
- Checking the status of disk unit connectivity.
- Checking storage availability. For more information, see Storage threshold monitoring.
If a communication or storage outage is detected, the health center suspends replication as an unplanned outage. Unplanned outages are eligible for automatic takeover. For more information, see Automatic takeover. When replication is suspended due to available storage threshold monitoring, it is not treated as an unplanned outage and therefore is not eligible for automatic takeover.
When the health center detects a communication outage, it does not immediately suspend replication for an unplanned outage. The health center waits a specified amount of time, in seconds, before it suspends replication. The suspend wait time interval property can be updated to wait between 15 and 3600 seconds (60 minutes) before a suspend of replication is initiated. Suspending replication due to a communication outage can be disabled by setting the suspend wait time interval property value to NOMAX.
CALL QSYS2.CHANGE_MIRROR_HEALTH_MONITOR(CRITICAL_RESOURCE_CHECK_INTERVAL => 30, SUSPEND_WAIT_TIME => 300);
Storage threshold monitoring
The health center monitors the amount of storage available in each auxiliary storage pool (ASP). The monitoring is performed for each ASP that comprises SYSBAS and each registered database IASP. When the amount of available storage for an ASP falls beneath the available storage threshold percentage property, replication is suspended.
When the amount of available storage for a user ASP defined within SYSBAS falls beneath the configured threshold, then a suspend of replication is initiated regardless of the amount of available storage in other ASPs that are a part of SYSBAS. Similarly, if an ASP group has secondary ASPs, then the available storage threshold is monitored for each ASP within the ASP group. A suspend of database IASP replication is initiated when the amount of available storage of any one ASP within the ASP group falls beneath the configured threshold for the database IASP. The suspend of SYSBAS replication by the health center also suspends replication for all registered database IASPs.
If a suspend of replication is initiated because an ASP falls beneath the configured threshold, the issue must be resolved before active replication can be resumed. Resolving the issue might involve deleting objects to make more storage available or changing the available storage threshold property. You have two options for removing replicated objects on the node that is blocked while replication is suspended. One option is to exclude the objects from replication before the deletion of them. The second option is to change the available storage threshold property to a value that allows active replication to resume. The deletion of replicated objects on the secondary node is allowed after active replication is resumed.
CALL QSYS2.CHANGE_MIRROR_HEALTH_MONITOR(IASP_NAME => '*SYSBAS', AVAILABLE_STORAGE_THRESHOLD => 5.00);
CALL QSYS2.CHANGE_MIRROR_HEALTH_MONITOR(IASP_NAME => 'DBIASP1', AVAILABLE_STORAGE_THRESHOLD => 'NONE');
Product resource checks
When an issue is detected with one or more of the Db2 Mirror product resources, the health center notifies the user by sending alerts to the QSYSOPR message queue. For steps on accessing Db2 Mirror alerts, see Accessing Db2 Mirror QSYSOPR messages from the Db2 Mirror GUI.
CALL QSYS2.CHANGE_MIRROR_HEALTH_MONITOR(RESOURCE_CHECK_INTERVAL => 5);
Monitoring Db2 Mirror jobs
The health center checks the status of various Db2 Mirror system and user jobs that are required for the proper operation of the Db2 Mirror product. The jobs are monitored to verify that they are active and available for use by Db2 Mirror to manage the environment and replicate objects. The jobs that are monitored by the health center are described in Db2 Mirror jobs.
When the health center detects an issue with one of the jobs, it attempts to recover that job. If the job is successfully recovered, then a CPDC925 message is sent to the QSYSOPR message queue. The message contains the details about the type of job recovered. If the job fails to recover, then a CPDC922 message is sent to the QSYSOPR message queue. The message includes the details about the job that failed to recover. When a job fails to recover, the health center might also suspend replication as an unplanned outage if the replication of objects is jeopardized.
License expiration monitoring
Monitoring of the required license keys for the Db2 Mirror product is also included within the health center. The license expiration threshold property is set separately on each node. It defines the remaining number of days before alerts by the health center begin for any approaching expired license keys. When the health center detects an approaching license key expiration, then a CPDC924 message is sent to the QSYSOPR message queue. The message contains details on how many days remain for the license key. Expired or missing license keys for Db2 Data Mirroring (5770SS1 Option 48) or Db2 Mirror Enablement (5770DBM Option 1) can cause Db2 Mirror nodes to remain suspended.
CALL QSYS2.CHANGE_MIRROR_HEALTH_MONITOR(LICENSE_EXPIRATION_THRESHOLD => 30);
CALL QSYS2.CHANGE_MIRROR_HEALTH_MONITOR(LICENSE_EXPIRATION_THRESHOLD => 'NONE');
Monitoring other resources
The health center also checks other less-critical product resources. These checks include ensuring that the system clocks remain synchronized and verifying that exit programs that were registered by the Db2 Mirror product remain registered. When an issue is detected for one of these less-critical resources, a CPDC924 message that indicates the type of issue is sent to the QSYSOPR message queue.