Failover and restore operations to the intermediate site during a planned outage

Use this process to run failover and restore operations to the intermediate (B) site during an unplanned outage.

About this task

For this scenario, assume that you have to shut down the local site for any reason and move production from the local site to the intermediate site and then return production back to the local site. You can use the following failover and failback procedures for this scenario. It is assumed that you established Global Mirror sessions that are creating consistency groups at the local site and sending them to the remote site.

During the outage and until you resume processing at the local site, you run a failover operation to allow operations to run from your intermediate site, which is protected by a two-site Global Mirror configuration. Global Mirror continues sending updates to the storage unit at the remote site and continues to form consistency groups. When production is ready to return to the local site, you run a failback operation.

Note: When a local site fails, systems must be reset and subsequently restarted using data from the B volumes following a failover operation. GDPS HyperSwap can do this transparently (without any system outage for systems running at the intermediate site) through the use of a single script statement for planned outages and autonomically for unplanned outages.

Complete these tasks for failover and restore operations at the intermediate site: (The steps in this scenario are examples.)

Procedure

  1. At the local site, ensure that data consistency is achieved between the A to B volume pairs. This process helps coordinate the A volumes and B volumes consistency and allows consistent data to be copied to the remote site.
    You can use either of the following methods to create data consistency:
    • Quiesce I/O processing to the A volumes at the local site. Continue to step 2.
    • Freeze write activity to the Metro Mirror primary volumes by completing the following steps:
    1. Freeze updates to the A volumes in Metro Mirror relationships across the affected LSSs.
      Enter the freezepprc command at the dscli command prompt with the following parameters and variables:
      dscli> freezepprc -dev IBM.2107-130165X -remotedev
      IBM.2107-75ALA2P 07-12
      The following represents an example of the output:
      CMUC00161W freezepprc: Remote Mirror and Copy consistency group 07:12
       successfully created.
      This process ensures that the B volumes are consistent at the time of the freeze. (One command per storage unit or LSS is required.) As a result of the freeze action, the following actions are taken:
      • The established paths between the logical subsystem (LSS) pairs are deleted.
      • The volume pairs that are associated with the source and target LSSs are suspended. During this time, the storage unit collects data that is sent to the A Metro Mirror volumes.
      • I/O processing to the Metro Mirror volume pairs is temporarily queued during the time that updates are frozen.
    2. If wanted, you can view the state of the pair status at the local site after the freezepprc command has been processed.
      Enter the lspprc command at the dscli command prompt with the following parameters and variables:
      dscli> lspprc -dev IBM.2107-130165X -remotedev
      IBM.2107-75ALA2P -fmt default 0700-075f
      The following represents an example of the output:
      Notes:
      1. The command example uses the command parameter -fmt default. This command parameter specifies that the output be set to a space-separated plain text table.
      2. The following table format is presented for clarity. The actual report is not displayed in this format.
      3. The report example represents the information that is reported on when you do not specify the -l parameter.
      See Viewing information about Metro Mirror relationships for more information.
      ID State Reason Type
      Source-
      LSS
      Timeout
      (secs)
      Critical
      Mode
      First
      Pass
      Status
      0700:1200
      Suspended
      Freeze Metro Mirror 07 unknown Disabled Invalid
      0701:1201
      Suspended
      Freeze Metro Mirror 07 unknown Disabled Invalid
      0702:1202
      Suspended
      Freeze Metro Mirror 07 unknown Disabled Invalid
    3. Resume operations following a freeze.
      Issue the unfreezepprc command to allow I/O processing to resume for the specified volume pairs. Enter the unfreezepprc command at the dscli command prompt with the following parameters and variables:
      Note: This activity is sometimes referred to as a thaw operation.
      dscli> unfreezepprc -dev IBM.2107-130165X -remotedev
      IBM.2107-75ALA2P 07:12
      The following represents an example of the output:
      CMUC00198I unfreezepprc: Remote Mirror and Copy pair 07:12  
      successfully thawed.
  2. Issue a failover command to the B to A volume pairs.
    This process detects that the B volumes are cascaded volumes at the intermediate site. When the command processes, the B volumes remain as primaries in a duplex pending state and secondaries to the A volumes. The B volumes remain nonexistent (or unavailable) secondary volumes to the A volumes in a Metro Mirror relationship. (In a cascaded relationship, the B volumes cannot be primary volumes in a Metro Mirror and Global Copy relationship at the same time.) When the direction of the volumes are switched and I/O processing is directed to the new primary B volumes, it is essential that the primary volumes (the A volumes) be the same size as the secondary volumes (the B volumes).

    See Running a recovery failover operation for more information.

    Enter the failoverpprc command at the dscli command prompt with the following parameters and variables:

    dscli> failoverpprc -dev IBM.2107-75ALA2P -remotedev
    IBM.2107-130165X -type gcp -cascade  1200-125f:1a00-1a5f
    The following represents an example of the output:
    CMUC00196I failoverpprc: Remote Mirror and Copy pair 1200:1A00 
    successfully reversed.
    CMUC00196I failoverpprc: Remote Mirror and Copy pair 1201:1A01 
    successfully reversed.
    
  3. Redirect host I/O processing to the B volumes. Changes are recorded on the B volumes until the A volumes can be resynchronized with the B volumes.
  4. When the A volumes are ready to return to production, pause the Global Mirror session between the B to C volumes.
    Direct this command to the same LSS that you used to start the session. This step is needed to later change the direction of the B volumes and restore the A volumes. Enter the pausegmir command at the dscli command prompt with the following parameters and variables:
    dscli> pausegmir -dev IBM.2107-75ALA2P -quiet  -lss 07 -session 1
    The following represents an example of the output:
    CMUC00165I pausegmir: Global Mirror for session 1 successfully paused.

    See Pausing Global Mirror processing for more information.

  5. Suspend (pause) the B to C volume pairs.
    Because the site B volumes cannot be source volumes for Metro Mirror and Global Copy relationships, you must suspend the B to C volumes so that B to A volumes can be established. This step stops all incoming write I/O operations to the affected B and C volume pairs and helps prepare for a later resynchronization of the A volumes with the current operating B volumes.

    Enter the pausepprc command at the dscli command prompt with the following parameters and variables:

    dscli> pausepprc -dev IBM.2107-75ALA2P -remotedev
    IBM.2107-1831760 1200-125f:0700-075f
    The following represents an example of the output:
    CMUC00157I pausepprc: Remote Mirror and Copy volume pair 1200:0700 
    relationship successfully paused.
    CMUC00157I pausepprc: Remote Mirror and Copy volume pair 1201:0701 
    relationship successfully paused.
    
    See Pausing a Metro Mirror relationship for more information.
  6. Establish paths between the local site LSS and intermediate site LSS that contain the B to A Metro Mirror volumes.
    Enter the mkpprcpath command at the dscli command prompt with the following parameters and variables:
    dscli> mkpprcpath -dev IBM.2107-75ALA2P -remotedev
    IBM.2107-130165X -remotewwnn 5005076303FFC550
    -srclss 07 -tgtlss 12 -consistgrp I0102:I0031 I0002:I0102 
    The following represents an example of the output:
    CMUC00149I mkpprcpath: Remote Mirror and Copy path 07:12
     successfully established.
    
    See Creating remote mirror and copy paths for more information.
  7. Issue a failback command to the B volumes (with A volumes as secondaries). Host I/O processing continues uninterrupted to the B volumes as the A volumes are made current. This command copies the changes back to the A volumes that were made to the B volumes while hosts are running on the B volumes. (In a DS CLI environment, where the local and intermediate sites use different management consoles, you have to use a different DS CLI session for the management console of the B volumes at the intermediate site.) See Running a recovery failback operation for more information.
    Enter the failbackpprc command at the dscli command prompt with the following parameters and variables:
    dscli> failbackpprc -dev IBM.2107-75ALA2P -remotedev
    IBM.2107-130165X
    -type gcp 1200-125f:1a00-1a5f
    The following represents an example of the output:
    CMUC00197I failbackpprc: Remote Mirror and Copy pair 1200:1a00 
    successfully failed back.
    CMUC00197I failbackpprc: Remote Mirror and Copy pair 1201:1a01 
    successfully failed back.
    
  8. Wait for the copy process of the B to A volumes to reach full duplex status (all out-of-sync tracks have completed copying).
    Host writes are no longer tracked. You can monitor when the number of out-of-sync tracks reaches zero by querying the status of the volumes. See Viewing information about Metro Mirror relationships for more information.
    Enter the lspprc command at the dscli command prompt with the following parameters and variables:
    dscli> lspprc -l 1200-125f:1a00-1a5f
    The following represents an example of the output:
    ID State Reason Type
    Out
    Of
    Sync
    Tracks
    Tgt
    Read
    Src
    Cascade
    1200:1a00
    Copy
    Pending
    -
    Metro
    Mirror
    46725 Disabled Disabled
    1201:1a01
    Copy
    Pending
    -
    Metro
    Mirror
    46725 Disabled Disabled
    Tgt
    Cascade
    Date
    Sus
    pend
    ed
    Source
    LSS
    Timeout
    (secs)
    Crit
    Mode
    First
    Pass
    Status
    Incre-
    mental
    Resync
    Tgt
    Write
    Invalid - 10 Unknown Disabled Invalid Enabled Enabled
    Invalid - 10 Unknown Disabled Invalid Enabled Enabled
  9. Quiesce host I/O processing to the B volumes.
  10. Issue a failover command to the A to B volume pairs.
    This process ends the B to A volume relationships and establishes the A to B volume relationships. Enter the failoverpprc command at the dscli command prompt with the following parameters and variables:
    dscli> failoverpprc -dev IBM.2107-130126X -remotedev
    IBM.2107-75ALA2P -type mmir 1a00-1a5f:1200-125f
    The following represents an example of the output:
    CMUC00196I failoverpprc: Remote Mirror and Copy pair 1A00:1200 
    successfully reversed.
    CMUC00196I failoverpprc: Remote Mirror and Copy pair 1A01:1201 
    successfully reversed.
    
    See Running a recovery failover operation for more information.
  11. After the failover operation, you can view the status of the volumes with the lspprc command.
    Enter the lspprc command at the dscli command prompt with the following parameters and variables:
    dscli> lspprc -dev IBM.2107-130126X -remotedev
    IBM.2107-75ALA2P -fmt default 1a00-1a5f
    The following represents an example of the output:
    Notes:
    1. The command example uses the command parameter -fmt default. This command parameter specifies that the output be set to a space-separated plain text table.
    2. The following table format is presented for clarity. The actual report is not displayed in this format.
    3. The report example represents the information that is reported on when you do not specify the -l parameter.
    ID State Reason Type
    Source-
    LSS
    Timeout
    (secs)
    Critical
    Mode
    First
    Pass
    Status
    0700:1200
    Suspend-
    ed
    Host Source Metro Mirror 1A unknown Disabled Invalid
    0701:1201
    Suspend-
    ed
    Host Source Metro Mirror 1A unknown Disabled Invalid
    0702:1202
    Suspend-
    ed
    Host Source Metro Mirror 1A unknown Disabled Invalid
  12. Reestablish paths (that were disabled by the freeze operation) between the local site LSS and intermediate site LSS that contain the B to A Metro Mirror volume pairs.
    Enter the mkpprcpath command at the dscli command prompt with the following parameters and variables:
    dscli> mkpprcpath -dev IBM.2107-130165X -remotedev
    IBM.2107-75ALA2P -remotewwnn 5005076303FFC550 -srclss
    07 -tgtlss 12 -consistgrp I0102:I0031 I0002:I0102 
    The following represents an example of the output:
    CMUC00149I mkpprcpath: Remote Mirror and Copy path 07:12  
    successfully established.
    See Reestablishing remote mirror and copy paths (site A to site B) for more information.
  13. Issue a failback command to the A to B volumes. This failback command completes the restoration of the A to B volume relationships (the B volume becomes the target).
    The replication of the data starts immediately when the command is finished. Depending on how many tracks have changed during the disaster recovery test, resynchronization might take a long time.
    Note: At this point, you can resume host I/O processing to the local site if optimizing host availability is critical. However, new host I/O that is written to the A volumes at the local site is not fully protected by Global Mirror processing until the Global Mirror operation is restored in step 16.
    Enter the failbackpprc command at the dscli command prompt with the following parameters and variables:
    dscli> failbackpprc -dev IBM.2107-130165X -remotedev
    IBM.2107-75ALA2P -type mmir
    1a00-1a5f:1200-125f
    The following represents an example of the output:
    CMUC00197I failbackpprc: Remote Mirror and Copy pair 1A00:1200 
    successfully failed back.
    CMUC00197I failbackpprc: Remote Mirror and Copy pair 1A01:1201 
    successfully failed back.
    
  14. Reestablish Global Copy relationships between the B to C volumes with the -cascade option.
    When the failback operation has been done, Global Copy relationships can be re-created.
    Enter the mkpprc command at the dscli command prompt with the following parameters and variables:
    dscli> mkpprc -dev IBM.2107-75ALA2P -remotedev
    IBM.2107-1831760
    -type gcp -mode nocp -cascade 1200-125f:0700-075f
    The following represents an example of the output:
    CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 1200:0700
     successfully created.
    CMUC00153I mkpprc: Remote Mirror and Copy volume pair relationship 1201:0701
     successfully created.
    
  15. Wait until the first pass of the Global Copy copying processing of the B to C volume pairs has completed.
    You can monitor this activity by querying the status of the volumes.
    Enter the lspprc command at the dscli command prompt with the following parameters and variables:
    dscli> lspprc -dev IBM.2107-75ALA2P -remotedev
    IBM.2107-1831760
    -l -fmt default 1200-125f:0700-075f
    The following represents an example of the output:
    ID State Reason Type
    Out
    Of
    Sync
    Tracks
    Tgt
    Read
    Src
    Cascade
    0700:1200
    Copy
    Pending
    -
    Global
    Copy
    0 Disabled Disabled
    0701:1201
    Copy
    Pending
    -
    Global
    Copy
    0 Disabled Disabled
    0702:1202
    Copy
    Pending
    -
    Global
    Copy
    0 Disabled Disabled
    Tgt
    Cascade
    Date
    Sus
    pended
    Source
    LSS
    Timeout
    (secs)
    Crit
    Mode
    First
    Pass
    Status
    Incre
    mental
    Resync
    Tgt
    Write
    Invalid - 07 Unknown Disabled True Enabled Enabled
    Invalid - 07 Unknown Disabled True Enabled Enabled
    Invalid - 07 Unknown Disabled True Enabled Enabled
  16. Resume Global Mirror. Now that the original infrastructure has been restored, you can resume the Global Mirror session.
    Enter the resumegmir command at the dscli command prompt with the following parameters and variables:
    dscli> resumegmir -dev IBM.2107-75ALA2P -session 1 -lss 07
    The following represents an example of the output:
    CMUC00164I resumegmir: Global Mirror for session 1 successfully resumed.

    See Resuming Global Mirror processing for more information.

  17. Resume host I/O processing to the A volumes. Direct host I/O processing back to the A volumes in preparation for resuming host I/O on the A volumes.
  18. Verify that consistency group are forming successfully.
    Enter the showgmir -metrics command at the dscli command prompt with the following parameters and variables:
    dscli> showgmir -metrics 07

    The following represents an example of the output:

    See Querying Global Mirror processing for more information.

    ID
    Total
    Failed
    CG
    Count
    Total
    Succes-
    sful CG
    Count
    Succes-
    sful CG
    Percen-
    tage
    Failed
    CG after
    Last
    Success
    Last
    Succes-
    sful CG
    Form
    Time
    Coord.
    Time
    (milli-
    seconds)
    CG
    Interval
    Time
    (sec-
    onds)
    IBM.2107
    -130165X
    /07
    0 55 100 0
    02/20/
    2006
    11:38:25
    MST
    50 0
    Max
    CG
    Drain
    Time
    (seconds)
    First
    Failure
    Control
    Unit
    First
    Failure
    LSS
    First
    Failure
    Status
    First
    Failure
    Reason
    First
    Failure
    Master
    State
    Last
    Failure
    Control
    Unit
    Last
    Failure
    LSS
    30 - -
    No Error
    - - - -
    Last
    Failure
    Status
    Last
    Failure
    Reason
    Last
    Failure
    Master
    State
    Previous
    Failure
    Control
    Unit
    Previous
    Failure
    LSS
    Previous
    Failure
    Status
    Previous
    Failure
    Reason
    Previous
    Failure
    Master
    State
    No Error
    - - - -
    No Error
    - -