Setting the unhealthy host response

In rare situations, it is possible that excessive load or paging on a single host in the cluster impacts throughput or availability in the rest of the cluster. The standard response to this situation is make configuration or hardware changes to increase memory; however, if that is not possible, you can also specify that automatic corrective action is taken, , to prevent any impact on the cluster.

Before you begin

To perform this task, you need to be the Db2® cluster services administrator.

About this task

By default, Db2 cluster services takes no action if this unhealthy host problem occurs; no information related to this unhealthy host feature is written to the log and you must use other tools to monitor the relevant system specifics related to the unhealthy hosts. This task describes how to specify unhealthy host response to be triggered when the problem is detected. One of the following two actions can be configured:
reboot the host
This reboots the host, forcing any member on that host to restart in light mode on another host and any primary cluster caching facility to fail over the primary role to the secondary cluster caching facility. After the reboot is complete, any resident member restarts on that host (unless automatic failback is disabled), and any cluster caching facility restarts as the secondary cluster caching facility. Because this provides the strongest assurance that the issues on the unhealthy host will have the least impact on the rest of the cluster, this is the preferred option.
offline the member or cluster caching facility
This takes any member or cluster caching facility on the host offline. Any processes on the member are stopped and it will restart in light mode on another host. If the primary cluster caching facility is on that host, the secondary cluster caching facility will take over the primary role. An alert is raised that needs to be cleared manually for the member to failback or for the cluster caching facility to start again. This option is suitable if there are other things on the host that you do not want to be impacted or if you want to keep the host available for diagnosing the problem. The manual clearing of the raised alert is also a possible reason for choosing this option.
It bears repeating that setting this option is really a last-resort response for preventing unplanned cluster issues. If you suspect that this unhealthy host state might occur, the best response is to add memory resources to the host.

Procedure

To set up the unhealthy host response:
  • If you want to automatically reboot the host, issue the following command:
    db2cluster -cm -create -unhealthy_host_response -option -reboot_host -option -apply_to_current_host
  • If you want to automatically take any member or cluster caching facility on the host offline, issue the following command:
    db2cluster -cm -create -unhealthy_host_response -option -offline_member -option -apply_to_current_host

Results

If you specified host reboot as the automatic response and the host is successfully rebooted, then any member should be restarted on the host. After a reboot, the member should be restarted on the host, unless automatic failback is disabled. Any cluster caching facility on the host should be restarted on the host, but if it was previously the primary cluster caching facility on the host , it will now be the secondary cluster caching facility on the host. Information about the reboot event is written to the syslog file. For more information, see the Related links section.

If you specified that the member or cluster caching facility should be taken offline and this does occur, the member or cluster caching facility will have an alert on it. The offlined member will not restart on or failback to its home host until this alert is removed. The offlined cluster caching facility will not restart until this alert is removed.