Recovering from storage controller failure

When a storage controller or disk fails in a replicated file system, the system routes access to the other replica. No administrator intervention is required and there is no impact to accessing the file systems. After the problem is resolved and the disks become available again, they need to be started manually.

Before you begin

  • To perform this task, you must be a Db2® cluster services administrator.

Procedure

  1. To determine if a file system has any disks in down state, use the following command:
    db2cluster -cfs -list -filesystem <fs name> 
  2. If there are any disks in Down state under the STATE column, restart them by using the following command:
    db2cluster -cfs -start -filesystem <fs name> -disk

    This does not trigger the replication of data that were added while the other storage controller was down. When this command returns, new data starts replicating on both storage.

    The disk startup operation involves auto repair of file system metadata and user files if applicable. The latter, if required, will result in access restriction to the impacted files and block on-going transactions that require access to them. The duration depends on the extent to which the repair is required. Therefore, it is recommended to run this command at off-peak usage period.

  3. Verify all disks in the file system have been successfully started and the states are Up by running the following command:
    db2cluster -cfs -list -filesystem <fs name>

    The above command will re-validate the configuration and status of the target file system. As part of the re-validation, existing alerts will be removed if condition has been resolved. New alerts will be raised if applicable.

  4. Run the following command to replicate data added to the system during the storage downtime. This is an I/O intensive operation and is recommended to run at off-peak usage period.
    db2cluster -cfs -replicate -filesystem <fs name>
  5. Finally, a rebalance of the file system might be needed, run db2instance -list to check for any alerts. Use db2cluster -list -alerts to get the details of the alert. If an unbalanced alert is raised, run the following command to rebalance the file system. This is an I/O intensive operation and is recommended to run at off-peak usage period.
    db2cluster -cfs -rebalance -filesystem <fs name>