Failing back to the old primary site
The old primary site must be restored when it is repaired.
Important: For a planned or unplanned failure, it is
recommended to use the role reversal method instead of failover and failback methods. For more
information about the role reversal, see Role reversal.
Complete the following steps to restore the old primary site when it is repaired and it is back online after the disaster:
-
Issue the following command on the old primary:
# mmafmctl Device failbackToPrimary -j FilesetName { --start | --stop }[--force]
The --start option restores the primary to the contents from the last RPO on the primary before the disaster. With the --start option, the primary is in the read-only mode. This mode avoids accidental corruption until the failback process is completed. After the old primary site starts functioning again, all RPOs that were present before the disaster can be accessed. If a common RPO snapshot psnap0 is not present, the old primary site can be converted to a normal GPFS fileset. To set up the primary site, see the steps in Failing back to the new primary site.If the --start option that is run on the fileset is unsuccessful, next the --start failbackToPrimary option might not be allowed. You can use the --force option to start failback again.
While the primary is coming back, as an administrator ensures that I/O does not occur on the primary fileset before the start of the
failback --start
process. -
Issue the following command on the old primary:
# mmafmctl Device { applyUpdates | getPrimaryId } -j FilesetName
This command applies differences that are created by applications on the old primary site as the current primary site took over the applications.
All the differences can be brought over in a single or multiple iterations. For minimizing the application downtime, this command can be run repeatedly to synchronize the contents of the original primary site with the current primary site. When the contents on both the sites are as close as possible or have minimal differences, applications must take a downtime and this command must be run one last time. applyUpdates might fail with an error during instances when the acting primary is overloaded. In such cases, the command needs to be run again. For more information about minimizing application downtime during this step, see Failback of multiple filesets. -
On the old primary, complete the failback process by running mmafmctl with
the failbackToPrimary --stop option.
With this command, the fileset is in the read/write mode. The primary site is ready for starting the applications. If the --stop option of the failback does not complete due to errors and you cannot stop the failback, it can be forced to stop with the --force option.
- Convert the current primary site back to the secondary site, and set the primary ID.
Unlink the acting primary site, change it to secondary by issuing the following command on the
acting primary
(secondary):
# mmchfileset device fileset -p afmMode=secondary -p afmPrimaryID=primaryid)
NFS can be restarted on the secondary site to ensure that the secondary export is accessible to the primary site. The primary and secondary sites are connected back as before the primary disaster and all data from the primary is played on the secondary site. Regular RPO also resumes on the primary site.