Group restart and crash recovery

Group restart is the process of restarting the entire Db2® pureScale® instance by restarting the database server processes for all members and cluster caching facilities and performing a group crash recovery to bring the database back online.

A group restart occurs when there is not a viable primary cluster caching facility in the Db2 pureScale instance. This event is automatically detected and handled by Db2 cluster services. Group restart will be automatically initiated as soon as a primary cluster caching facility and a member become available. As group restart occurs, the database will be inaccessible across all members.

There are a few situations that can lead to the need for a group restart:
  • If the instance is running with only one cluster caching facility, and that cluster caching facility fails.
  • The primary cluster caching facility fails before the secondary cluster caching facility has reached PEER state.
  • If both cluster caching facilities fail.

Group crash recovery

Group crash recovery is responsible for making the database consistent by redoing any work that had not been written to disk and rolling back any transactions that had not been committed at the time of the failure. Group crash recovery is similar to Db2 crash recovery outside of a Db2 pureScale environment, but it uses the merged log streams from all members active on the database. Because group crash recovery automatically occurs as part of group restart, users generally do not have to take any action if a group crash recovery is required while a functioning cluster manager is present.