Understanding how the Db2 pureScale Feature automatically handles failure

When a failure occurs in the Db2 pureScale instance, Db2 cluster services automatically attempts to restart the failed resources. When and where the restart occurs depends on different factors, such as the type of resource that failed and the point in the resource life cycle at which the failure occurred.

If a software or hardware failure on a host causes a Db2 member to fail, Db2 cluster services automatically restarts the member. Db2 members can be restarted on either the same host (local restart) or if that fails, on a different host (member restart in restart light mode). Restarting a member on another host is called failover.

Member restart includes restarting failed Db2 processes and performing member crash recovery (undoing or reapplying log transactions) to roll back any inflight transactions and to free any locks held by them. Member restart also ensures that updated pages have been written to the CF. When a member is restarted on a different host in restart light mode, minimal resources are used on the new host (which is the home host of another Db2 member). A member running in restart light mode does not process new transactions, because its sole purpose is to perform member crash recovery.

The databases on the failed member are recovered to a point of consistency as quickly as possible. This enables other active members to access and change database objects that were locked by the abnormally terminated member. All inflight transactions from the failed member are rolled back and all locks that were held at the time of the abnormal termination of the member are released. Although the member does not accept new transactions, it remains available for resolution of indoubt transactions.

When a Db2 member has failed over to a new host, the total processing capability of the whole cluster is reduced temporarily. When the home host is active and available again, the Db2 member automatically fails back to the home host, and the Db2 member is restarted on its home host. The cluster's processing capability is restored as soon as the Db2 member has failed back and restarted on its home host. Transactions on all other Db2 members are not affected during the failback process.