Failback procedure if new backups were taken after failover to DR site
Procedure
- Stop the DR system by running:
Watchapstop
ap state -d
until the system is stopped. - Unmount the
ext_mnt
file system on the DR system by running:
Monitormmunmount ext_mnt -a
mmlsmount all -L
untilext_mnt
is no longer showing up as mounted on any nodes. - Export the GPFS file system to a metadata file by running:
mmexportfs ext_mnt -o /home/ext_san.config
scp
the metadata file to the prod system ate1n1:/home/ext_san.config
. Then on the prod system,scp
this file to all connector nodes to the same directory so that copies exist on multiple nodes.- Stop the prod system by running:
Watchapstop
ap state -d
until the system is stopped. - Unmount the
ext_mnt
file system on the prod system by running:
Monitormmunmount ext_mnt -a
mmlsmount all -L
untilext_mnt
is no longer showing up as mounted on any nodes. - Delete the
ext_mnt
file system on the prod system by running:
Watchmmdelfs ext_mnt
mmlsfs ext_mnt
until the command indicates thatext_mnt
is deleted. - Import the file system from the metadata file on the prod system by running:
mmimportfs ext_mnt -i /home/ext_san.config
- Mount the
ext_mnt
file system on the prod system by running:mmmount ext_mnt -a
- After mounting the
ext_mnt
file system on the prod system, run:
Watch untilmmlsmount all -L
ext_mnt
is mounted on all expected nodes (typically five nodes when two connector nodes exist). - Get the prod system online and ready for use by running:
apstart
- Watch
ap state -d
and verify thatap apps
showsVDB
asENABLED
. If not enabled, run:
Watchap apps enable vdb
ap apps
until it isENABLED
. - Verify that
ap node -d
shows one of the connector nodes asVDB_MASTER
. ssh
the connector node that isVDB_MASTER
. For example, ifenclosure7.node1
isVDB_MASTER
, thenssh
toe7n1
.- Enter the NPS® container:
docker exec -it ipshost1 bash
. - Monitor
nzstate -local
until it is online.Note: The estimated time fornzstate -local
to be online is 10 to 45 minutes. -
nzrestore
from the data that is backed up and replicated to the prod site SAN. While thenzrestore
is in progress, you can proceed with the following step.
- Watch
- Start the DR system by running:
Watchapstart
ap state -d
for it to be online and can be monitored for readiness and health.With this step, The failback is complete. You can activate replication going from prod to DR and resume production from the production site.