What to check after running the system recovery

Several tasks must be completed before you use the system.

The recovery procedure re-creates the old system from the quorum data. However, some things cannot be restored, such as cached data or system data that manages in-flight I/O. This latter loss of state affects RAID arrays that manage internal storage. The detailed map about where data is out of synchronization is lost, meaning that all parity information must be restored, and mirrored pairs must be brought back into synchronization. Normally, this action results in the use of either old or stale data, so only writes in flight are affected. However, if the array lost redundancy (such as syncing, degraded, or critical RAID status) before the error that requires system recovery, then the situation is more severe. Under this situation you need to check the internal storage:
  • Parity arrays are likely syncing to restore parity; they do not have redundancy when this operation proceeds.
  • Because there is no redundancy in this process, bad blocks might be created where data is not accessible.
  • Parity arrays might be marked as corrupted. This identification indicates that the extent of lost data is wider than in-flight I/O; to bring the array online, the data loss must be acknowledged.
  • RAID 6 arrays that were degraded before the system recovery might require a full restore from backup. For this reason, it is important to have at least a capacity match spare available.
Be aware of these differences about the recovered configuration:
  • FlashCopy® mappings are restored as idle_or_copied with 0% progress. Both volumes are restored to their original I/O groups. If FlashCopy mappings to volumes exist in a Safeguarded backup location, those FlashCopy mappings are not restored during system recovery. These FlashCopy mappings are typically created automatically by an external scheduler. FlashCopy mappings to new Safeguarded backups are created at the next scheduled time that is defined by the Safeguarded policy.
  • System recovery makes current Safeguarded backups invalid. Therefore, as part of the recovery process, all Safeguarded backups are deleted. After recovery is completed, the external scheduler will create new Safeguard backups based on the policy that was created when Safeguarded Copy function was configured.
  • Snapshots are not recovered (the VDisks used for snapshots will be deleted asynchronously). The clone and thin-clone volumes are recovered to be volumes of the correct type in a volume group, the volumes are corrupted. When recover VDisk commands are run against these volumes, the corrupted status is fixed but the data is not recovered because the snapshots are no longer available. It is for the user to delete and re-create these volume groups when they have the appropriate snapshots for the repopulation.
  • As part of the recovery process, all snapshots are deleted and the current snapshot becomes invalid. After the process is complete, snapshots are created based on the schedule of the policy that was defined when the snapshot was configured.
    Note: Snapshots might not be created if volumes are still offline or corrupted in the source volume group.
  • As part of recovery process, CHAP secret for iSCSI host can be entered manually and host should able to make logins.
  • As part of the recovery process, partnerships with remote systems are recovered. However, as the system ID of the recovered system has changed, the partnership configuration on remote systems must be updated with the new system ID. To update the new ID on each remote system, the partnership must be in a stopped state. The chpartnership -newclusterid CLI command must then be used on each remote system to update the partnership. Once the system ID has been updated the partnerships can be started.
  • The system recovers the replication configuration for any volume groups that use policy-based replication. However, a full resynchronization of volume groups occurs when replication is restarted. For more information, see 4040.
  • The system ID is different. Any scripts or associated programs that refer to the system-management ID of the system must be changed.
  • Any FlashCopy mappings that were not in the idle_or_copied state with 100% progress at the point of disaster have inconsistent data on their target disks. These mappings must be restarted.
  • Volumes with cloud snapshots that were enabled before the recovery need to have the cloud snapshots manually re-enabled.
  • If hardware was replaced before the recovery, the SSL certificate might not be restored. If it is not restored and previously used a CA-signed certificate, then the system creates an internally signed certificate signed by the internal root CA. Even if the previous certificate was signed by a trusted third-party CA, the system creates an internally signed certificate. If a self-signed certificate was previously used before the system recovery, then the system creates a self-signed certificate with 30 days of validity.
  • The system time zone might not be restored.
  • Any volumes being formatted as a system failure occurs are set to the "formating_corrupt" state by a system recovery and are taken offline. The recovervdisk CLI command must be used to recover the volume, synchronize it with a synchronized copy, and bring it back online.
  • After the system recovery process completes, the disks are initially set to entire real-capacity. When I/O resumes, the capacity is determined, and is adjusted to reflect the correct value.
    Similar behavior occurs when you use the -autoexpand option on volumes. The real capacity of a disk might increase slightly, caused by the same kind of behavior that affects compressed volumes. Again, the capacity shrinks down as I/O to the disk is resumed.
  • Distributed RAID 1 rebuild in place synchronizes data between data strip mirrors, where possible. This synchronization can be observed through the lsarraymemberprogress command.
  • If the system recovery occurs during a nondisruptive system migration, recovery of system data is dependent on the point in the migration process when the system recovery action occurred. For more information, see Verifying migration volumes after a system recovery.
Before you use the volumes, complete the following tasks.
  • Start the host systems.
  • Manual actions might be necessary on the hosts to trigger them to rescan for devices. You can complete this task by disconnecting and reconnecting the Fibre Channel cables to each host bus adapter (HBA) port.
  • Verify that all mapped volumes can be accessed by the hosts.
  • Run file system consistency checks.
    Note: Any data that was in the system write cache at the time of the failure is lost.
  • Run file system consistency checks.
  • Run the application consistency checks.
For VMware Virtual Volumes (vVols), complete the following tasks.
  • After you confirm that the T3 completed successfully, restart the embedded VASA provider service using the command: satask restartservice -service nginx.
  • Rescan the storage providers from using the vSphere Client.
    • Select vCSA > Configure > Storage Providers, select the storage provider and click on the rescan action.
For VMware Virtual Volumes (vVols), also be aware of the following information.
  • FlashCopy mappings are not restored for vVols. The implications are as follows.
    • The mappings that describe the VM's snapshot relationships are lost. However, the Virtual Volumes that are associated with these snapshots still exist, and the snapshots might still appear on the vSphere Client. This outcome might have implications on your VMware back up solution.
      • Do not attempt to revert to snapshots.
      • Use the vSphere Client to delete any snapshots for VMs on a vVol data store to free up disk space that is being used unnecessarily.
    • The targets of any outstanding 'clone' FlashCopy mappings might not function as expected (even if the vSphere Client previously reported recent clone operations as complete). For any VMs, which are targets of recent clone operations, complete the following tasks.
      • Complete data integrity checks as is recommended for conventional volumes.
      • If clones do not function as expected or show signs of corrupted data, take a fresh clone of the source VM to ensure that data integrity is maintained.