Failover and restore operations to the intermediate site during a planned outage
Use this process to run failover and restore operations to the intermediate (B) site during an unplanned outage.
About this task
For this scenario, assume that you have to shut down the local site for any reason and move production from the local site to the intermediate site and then return production back to the local site. You can use the following failover and failback procedures for this scenario. It is assumed that you established Global Mirror sessions that are creating consistency groups at the local site and sending them to the remote site.
During the outage and until you resume processing at the local site, you run a failover operation to allow operations to run from your intermediate site, which is protected by a two-site Global Mirror configuration. Global Mirror continues sending updates to the storage unit at the remote site and continues to form consistency groups. When production is ready to return to the local site, you run a failback operation.
Complete these tasks for failover and restore operations at the intermediate site: (The steps in this scenario are examples.)
Procedure
- At the local site, ensure that data consistency is achieved
between the A to B volume pairs. This process helps coordinate
the A volumes and B volumes consistency and allows consistent data
to be copied to the remote site. You can use either of the following methods to create data consistency:
- Quiesce I/O processing to the A volumes at the local site. Continue to step 2.
- Freeze write activity to the Metro Mirror primary volumes by completing the following steps:
- Freeze updates to the A volumes in Metro Mirror relationships
across the affected LSSs. Enter the freezepprc command at the dscli command prompt with the following parameters and variables:
dscli> freezepprc -dev IBM.2107-130165X -remotedev IBM.2107-75ALA2P 07-12
The following represents an example of the output:CMUC00161W freezepprc: Remote Mirror and Copy consistency group 07:12 successfully created.
This process ensures that the B volumes are consistent at the time of the freeze. (One command per storage unit or LSS is required.) As a result of the freeze action, the following actions are taken:- The established paths between the logical subsystem (LSS) pairs are deleted.
- The volume pairs that are associated with the source and target LSSs are suspended. During this time, the storage unit collects data that is sent to the A Metro Mirror volumes.
- I/O processing to the Metro Mirror volume pairs is temporarily queued during the time that updates are frozen.
- If wanted, you can view the state of the pair status
at the local site after the freezepprc command has been processed.
Enter the lspprc command at the dscli command prompt with the following parameters and variables:
dscli> lspprc -dev IBM.2107-130165X -remotedev IBM.2107-75ALA2P -fmt default 0700-075f
The following represents an example of the output:Notes:- The command example uses the command parameter -fmt default. This command parameter specifies that the output be set to a space-separated plain text table.
- The following table format is presented for clarity. The actual report is not displayed in this format.
- The report example represents the information that is reported on when you do not specify the -l parameter.
ID State Reason Type Source-
LSSTimeout
(secs)Critical
ModeFirst
Pass
Status0700:1200 SuspendedFreeze Metro Mirror 07 unknown Disabled Invalid 0701:1201 SuspendedFreeze Metro Mirror 07 unknown Disabled Invalid 0702:1202 SuspendedFreeze Metro Mirror 07 unknown Disabled Invalid - Resume operations following a freeze. Issue the unfreezepprc command to allow I/O processing to resume for the specified volume pairs. Enter the unfreezepprc command at the dscli command prompt with the following parameters and variables:Note: This activity is sometimes referred to as a thaw operation.
dscli> unfreezepprc -dev IBM.2107-130165X -remotedev IBM.2107-75ALA2P 07:12
The following represents an example of the output:CMUC00198I unfreezepprc: Remote Mirror and Copy pair 07:12 successfully thawed.
- Issue a failover command to the B to A volume
pairs. This process detects that the B volumes are cascaded volumes at the intermediate site. When the command processes, the B volumes remain as primaries in a duplex pending state and secondaries to the A volumes. The B volumes remain nonexistent (or unavailable) secondary volumes to the A volumes in a Metro Mirror relationship. (In a cascaded relationship, the B volumes cannot be primary volumes in a Metro Mirror and Global Copy relationship at the same time.) When the direction of the volumes are switched and I/O processing is directed to the new primary B volumes, it is essential that the primary volumes (the A volumes) be the same size as the secondary volumes (the B volumes).
See Running a recovery failover operation for more information.
Enter the failoverpprc command at the dscli command prompt with the following parameters and variables:
dscli> failoverpprc -dev IBM.2107-75ALA2P -remotedev IBM.2107-130165X -type gcp -cascade 1200-125f:1a00-1a5f
The following represents an example of the output:CMUC00196I failoverpprc: Remote Mirror and Copy pair 1200:1A00 successfully reversed. CMUC00196I failoverpprc: Remote Mirror and Copy pair 1201:1A01 successfully reversed.
- Redirect host I/O processing to the B volumes. Changes are recorded on the B volumes until the A volumes can be resynchronized with the B volumes.
- When the A volumes are ready to return to production,
pause the Global Mirror session
between the B to C volumes. Direct this command to the same LSS that you used to start the session. This step is needed to later change the direction of the B volumes and restore the A volumes. Enter the pausegmir command at the dscli command prompt with the following parameters and variables:
dscli> pausegmir -dev IBM.2107-75ALA2P -quiet -lss 07 -session 1
The following represents an example of the output:CMUC00165I pausegmir: Global Mirror for session 1 successfully paused.
See Pausing Global Mirror processing for more information.
- Suspend (pause) the B to C volume pairs. Because the site B volumes cannot be source volumes for Metro Mirror and Global Copy relationships, you must suspend the B to C volumes so that B to A volumes can be established. This step stops all incoming write I/O operations to the affected B and C volume pairs and helps prepare for a later resynchronization of the A volumes with the current operating B volumes.
Enter the pausepprc command at the dscli command prompt with the following parameters and variables:
dscli> pausepprc -dev IBM.2107-75ALA2P -remotedev IBM.2107-1831760 1200-125f:0700-075f
The following represents an example of the output:CMUC00157I pausepprc: Remote Mirror and Copy volume pair 1200:0700 relationship successfully paused. CMUC00157I pausepprc: Remote Mirror and Copy volume pair 1201:0701 relationship successfully paused.
See Pausing a Metro Mirror relationship for more information. - Establish paths between the local site LSS and intermediate
site LSS that contain the B to A Metro Mirror volumes.
Enter the mkpprcpath command at the dscli command prompt with the following parameters and variables:
dscli> mkpprcpath -dev IBM.2107-75ALA2P -remotedev IBM.2107-130165X -remotewwnn 5005076303FFC550 -srclss 07 -tgtlss 12 -consistgrp I0102:I0031 I0002:I0102
The following represents an example of the output:CMUC00149I mkpprcpath: Remote Mirror and Copy path 07:12 successfully established.
See Creating remote mirror and copy paths for more information. - Issue a failback command to the B volumes
(with A volumes as secondaries). Host I/O processing continues
uninterrupted to the B volumes as the A volumes are made current.
This command copies the changes back to the A volumes that were made
to the B volumes while hosts are running on the B volumes. (In a DS
CLI environment, where the local and intermediate sites use different
management consoles, you have to use a different DS CLI session for
the management console of the B volumes at the intermediate site.)
See Running a recovery failback operation for
more information. Enter the failbackpprc command at the dscli command prompt with the following parameters and variables:
dscli> failbackpprc -dev IBM.2107-75ALA2P -remotedev IBM.2107-130165X -type gcp 1200-125f:1a00-1a5f
The following represents an example of the output:CMUC00197I failbackpprc: Remote Mirror and Copy pair 1200:1a00 successfully failed back. CMUC00197I failbackpprc: Remote Mirror and Copy pair 1201:1a01 successfully failed back.
- Wait for the copy process of the B to A volumes to reach
full duplex status (all out-of-sync tracks have completed copying).
Host writes are no longer tracked. You can monitor when the number of out-of-sync tracks reaches zero by querying the status of the volumes. See Viewing information about Metro Mirror relationships for more information.Enter the lspprc command at the dscli command prompt with the following parameters and variables:
dscli> lspprc -l 1200-125f:1a00-1a5f
The following represents an example of the output:ID State Reason Type Out
Of
Sync
TracksTgt
ReadSrc
Cascade1200:1a00 Copy
Pending- Metro
Mirror46725 Disabled Disabled 1201:1a01 Copy
Pending- Metro
Mirror46725 Disabled Disabled Tgt
CascadeDate
Sus
pend
edSource
LSSTimeout
(secs)Crit
ModeFirst
Pass
StatusIncre-
mental
ResyncTgt
WriteInvalid - 10 Unknown Disabled Invalid Enabled Enabled Invalid - 10 Unknown Disabled Invalid Enabled Enabled - Quiesce host I/O processing to the B volumes.
- Issue a failover command to the A to B volume pairs.
This process ends the B to A volume relationships and establishes the A to B volume relationships. Enter the failoverpprc command at the dscli command prompt with the following parameters and variables:
dscli> failoverpprc -dev IBM.2107-130126X -remotedev IBM.2107-75ALA2P -type mmir 1a00-1a5f:1200-125f
The following represents an example of the output:CMUC00196I failoverpprc: Remote Mirror and Copy pair 1A00:1200 successfully reversed. CMUC00196I failoverpprc: Remote Mirror and Copy pair 1A01:1201 successfully reversed.
See Running a recovery failover operation for more information. - After the failover operation, you can view the status
of the volumes with the lspprc command. Enter the lspprc command at the dscli command prompt with the following parameters and variables:
dscli> lspprc -dev IBM.2107-130126X -remotedev IBM.2107-75ALA2P -fmt default 1a00-1a5f
The following represents an example of the output:Notes:- The command example uses the command parameter -fmt default. This command parameter specifies that the output be set to a space-separated plain text table.
- The following table format is presented for clarity. The actual report is not displayed in this format.
- The report example represents the information that is reported on when you do not specify the -l parameter.
ID State Reason Type Source-
LSSTimeout
(secs)Critical
ModeFirst
Pass
Status0700:1200 Suspend-
edHost Source Metro Mirror 1A unknown Disabled Invalid 0701:1201 Suspend-
edHost Source Metro Mirror 1A unknown Disabled Invalid 0702:1202 Suspend-
edHost Source Metro Mirror 1A unknown Disabled Invalid - Reestablish paths (that were disabled by the freeze
operation) between the local site LSS and intermediate site LSS that
contain the B to A Metro Mirror volume
pairs. Enter the mkpprcpath command at the dscli command prompt with the following parameters and variables:
dscli> mkpprcpath -dev IBM.2107-130165X -remotedev IBM.2107-75ALA2P -remotewwnn 5005076303FFC550 -srclss 07 -tgtlss 12 -consistgrp I0102:I0031 I0002:I0102
The following represents an example of the output:CMUC00149I mkpprcpath: Remote Mirror and Copy path 07:12 successfully established.
See Reestablishing remote mirror and copy paths (site A to site B) for more information. - Issue a failback command to the A to B volumes. This
failback command completes the restoration of the A to B volume relationships
(the B volume becomes the target). The replication of the data starts immediately when the command is finished. Depending on how many tracks have changed during the disaster recovery test, resynchronization might take a long time.Note: At this point, you can resume host I/O processing to the local site if optimizing host availability is critical. However, new host I/O that is written to the A volumes at the local site is not fully protected by Global Mirror processing until the Global Mirror operation is restored in step 16.Enter the failbackpprc command at the dscli command prompt with the following parameters and variables:
dscli> failbackpprc -dev IBM.2107-130165X -remotedev IBM.2107-75ALA2P -type mmir 1a00-1a5f:1200-125f
The following represents an example of the output:CMUC00197I failbackpprc: Remote Mirror and Copy pair 1A00:1200 successfully failed back. CMUC00197I failbackpprc: Remote Mirror and Copy pair 1A01:1201 successfully failed back.
- Reestablish Global Copy relationships between the B
to C volumes with the -cascade option. When the failback operation has been done, Global Copy relationships can be re-created.Enter the mkpprc command at the dscli command prompt with the following parameters and variables:
dscli> mkpprc -dev IBM.2107-75ALA2P -remotedev IBM.2107-1831760 -type gcp -mode nocp -cascade 1200-125f:0700-075f
The following represents an example of the output:CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 1200:0700 successfully created. CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 1201:0701 successfully created.
- Wait until the first pass of the Global Copy copying
processing of the B to C volume pairs has completed. You can monitor this activity by querying the status of the volumes.Enter the lspprc command at the dscli command prompt with the following parameters and variables:
dscli> lspprc -dev IBM.2107-75ALA2P -remotedev IBM.2107-1831760 -l -fmt default 1200-125f:0700-075f
The following represents an example of the output:ID State Reason Type Out
Of
Sync
TracksTgt
ReadSrc
Cascade0700:1200 Copy
Pending- Global
Copy0 Disabled Disabled 0701:1201 Copy
Pending- Global
Copy0 Disabled Disabled 0702:1202 Copy
Pending- Global
Copy0 Disabled Disabled Tgt
CascadeDate
Sus
pendedSource
LSSTimeout
(secs)Crit
ModeFirst
Pass
StatusIncre
mental
ResyncTgt
WriteInvalid - 07 Unknown Disabled True Enabled Enabled Invalid - 07 Unknown Disabled True Enabled Enabled Invalid - 07 Unknown Disabled True Enabled Enabled - Resume Global Mirror. Now
that the original infrastructure has been restored, you can resume
the Global Mirror session.
Enter the resumegmir command at the dscli command prompt with the following parameters and variables:
dscli> resumegmir -dev IBM.2107-75ALA2P -session 1 -lss 07
The following represents an example of the output:CMUC00164I resumegmir: Global Mirror for session 1 successfully resumed.
See Resuming Global Mirror processing for more information.
- Resume host I/O processing to the A volumes. Direct host I/O processing back to the A volumes in preparation for resuming host I/O on the A volumes.
- Verify that consistency group are forming successfully.
Enter the showgmir -metrics command at the dscli command prompt with the following parameters and variables:
dscli> showgmir -metrics 07
The following represents an example of the output:
See Querying Global Mirror processing for more information.
ID Total
Failed
CG
CountTotal
Succes-
sful CG
CountSucces-
sful CG
Percen-
tageFailed
CG after
Last
SuccessLast
Succes-
sful CG
Form
TimeCoord.
Time
(milli-
seconds)CG
Interval
Time
(sec-
onds)IBM.2107
-130165X
/070 55 100 0 02/20/
2006
11:38:25
MST50 0 Max
CG
Drain
Time
(seconds)First
Failure
Control
UnitFirst
Failure
LSSFirst
Failure
StatusFirst
Failure
ReasonFirst
Failure
Master
StateLast
Failure
Control
UnitLast
Failure
LSS30 - - No Error- - - - Last
Failure
StatusLast
Failure
ReasonLast
Failure
Master
StatePrevious
Failure
Control
UnitPrevious
Failure
LSSPrevious
Failure
StatusPrevious
Failure
ReasonPrevious
Failure
Master
StateNo Error- - - - No Error- -