What to check after running the system recovery

Several tasks must be completed before you use the system.

The recovery procedure re-creates the old system from the quorum data. However, some things cannot be restored, such as cached data or system data that manages in-flight I/O. This latter loss of state affects RAID arrays that manage internal storage. The detailed map about where data is out of synchronization is lost, meaning that all parity information must be restored, and mirrored pairs must be brought back into synchronization. Normally, this action results in the use of either old or stale data, so only writes in flight are affected. However, if the array lost redundancy (such as syncing, degraded, or critical RAID status) before the error that requires system recovery, then the situation is more severe. Under this situation you need to check the internal storage:
  • Parity arrays are likely syncing to restore parity; they do not have redundancy when this operation proceeds.
  • Because there is no redundancy in this process, bad blocks might be created where data is not accessible.
  • Parity arrays might be marked as corrupted. This identification indicates that the extent of lost data is wider than in-flight I/O; to bring the array online, the data loss must be acknowledged.
  • RAID 6 arrays that were degraded before the system recovery might require a full restore from backup. For this reason, it is important to have at least a capacity match spare available.
Be aware of these differences about the recovered configuration:
  • FlashCopy® mappings are restored as idle_or_copied with 0% progress. Both volumes are restored to their original I/O groups.
  • The system ID is different. Any scripts or associated programs that refer to the system-management ID of the system must be changed.
  • Any FlashCopy mappings that were not in the idle_or_copied state with 100% progress at the point of disaster have inconsistent data on their target disks. These mappings must be restarted.
  • Intersystem partnerships and relationships are not restored and must be re-created manually.
  • Consistency groups are not restored and must be re-created manually.
  • Intrasystem Metro Mirror relationships are restored if all dependencies were successfully restored to their original I/O groups.
  • If hardware was replaced before the recovery, the SSL certificate might not be restored. If it is not restored, then a new self-signed certificate is generated with a validity of 30 days. Follow the associated Directed Maintenance Procedures (DMP) for a permanent resolution.
  • The system time zone might not be restored.
  • Any Global Mirror secondary volumes on the recovered system might have inconsistent data if replication I/O from the primary volume is cached on the secondary system at the point of the disaster. A full synchronization is required when re-creating and restarting these relationships.
  • Any volumes that are while being formatted as a system failure occurs are set to the "formating_corrupt" state by a system recovery and are taken offline. The recovervdisk CLI command must be used to recover the volume, synchronize it with a synchronized copy, and bring it back online.
  • After the system recovery process completes, the disks are initially set to entire real-capacity. When I/O resumes, the capacity is determined, and is adjusted to reflect the correct value.
    Similar behavior occurs when you use the -autoexpand option on volumes. The real capacity of a disk might increase slightly, caused by the same kind of behavior that affects compressed volumes. Again, the capacity shrinks down as I/O to the disk is resumed.
  • Distributed RAID 1 rebuild in place synchronizes data between data strip mirrors, where possible. This synchronization can be observed through the lsarraymemberprogress command.
  • If the system recovery occurs during a nondisruptive system migration, recovery of system data is dependent on the point in the migration process when the system recovery action occurred. For more information, see Verifying migration volumes after a system recovery.
For Virtual Volumes (VVols), complete the following tasks.
  • After you confirm that the T3 completed successfully, restart Spectrum Control Base (SCB) services. Use the Spectrum Control Base command service ibm_spectrum_control start.
  • Refresh the storage system information on the SCB GUI to ensure that the systems are in sync after the recovery.
    • To complete this task, login to the SCB GUI.
    • Hover over the affected storage system, select the menu launcher, and then select Refresh. This step repopulates the system.
    • Repeat this step for all Spectrum Control Base instances.
  • Rescan the storage providers from within the vSphere Web Client.
    • Select vCSA > Manage > Storage Providers > select Active VP > Re-scan icon.
For Virtual Volumes (VVols), also be aware of the following information.
  • FlashCopy mappings are not restored for VVols. The implications are as follows.
    • The mappings that describe the VM's snapshot relationships are lost. However, the Virtual Volumes that are associated with these snapshots still exist, and the snapshots might still appear on the vSphere Web Client. This outcome might have implications on your VMware back up solution.
      • Do not attempt to revert to snapshots.
      • Use the vSphere Web Client to delete any snapshots for VMs on a VVol data store to free up disk space that is being used unnecessarily.
    • The targets of any outstanding 'clone' FlashCopy relationships might not function as expected (even if the vSphere Web Client recently reported clone operations as complete). For any VMs, which are targets of recent clone operations, complete the following tasks.
      • Complete data integrity checks as is recommended for conventional volumes.
      • If clones do not function as expected or show signs of corrupted data, take a fresh clone of the source VM to ensure that data integrity is maintained.