Contents


Preserving redundant VIOS configuration on production site with IBM Geographically Dispersed Resiliency solution

Retain redundant VIOS configuration during disaster recovery

Comments
1

What is IBM Geographically Dispersed Resiliency for Power Systems?

IBM® Geographically Dispersed Resiliency for Power Systems™ is an innovative disaster recovery (DR) solution announced in 2016, which provides a simplified and easy to deploy disaster recovery solution across two sites for virtual machines (VMs) running on IBM POWER7® or IBM POWER8® processor-based servers. Geographically Dispersed Resiliency is similar to VMware Site Recovery Manager and IBM GDPS® solutions for IBM z Systems®, and is part of the GDPS family of disaster recovery solutions. Because disaster recovery of applications and services is a key component to provide continuity for business, the IBM Geographically Dispersed Resiliency solution helps customers to have an automated DR process during a failure.

You can learn more about Geographically Disperse Resiliency for Power Systems at the following IBM developerWorks® wiki documents: Why GDR is the ideal DR solution for Power Systems and FAQ.

The controlling system (KSYS), is the fundamental component in the Geographically Dispersed Resiliency solution. The KSYS handles discovery, monitoring, notification, recovery, and verification aspects.

After the initial configuration is complete, the KSYS node discovers all the hosts that are managed by the Hardware Management Consoles (HMCs) in both active and the backup sites and displays the status. During discovery, the KSYS node monitors the discovery of all logical partitions (LPARs) or VMs in all the managed hosts within the selected site. The KSYS node collects the configuration information for each LPAR and displays the status. The KSYS node discovers the disks of each VM and checks whether the VMs are configured currently for the storage devices mirroring.

After the discovery of the site is complete, the KSYS node fetches information from the HMC to check whether the backup site is capable of hosting the VMs during a disaster. The KSYS node also verifies storage replication-related details.

The Geographically Dispersed Resiliency solution supports both planned and unplanned disaster recovery methods.

  • Planned DR: In a planned move, an administrator initiates a move when there is no disaster event and the resources in the production site can be shut down gracefully. These types of operations are initiated mainly to perform a DR test drill, to move from one site to another, or when one of the sites needs to be taken offline for maintenance. In a planned move, automatic cleanup will be taken care by the controller system called the KSYS node.
  • Unplanned DR: In an unplanned scenario, whenever there is a failure or natural disaster, the admin will be notified about the disaster event with failure. Based on the situation, the admin can initiate an unplanned move. During an unplanned move, all virtual machines will be shut down abruptly and cleanup of virtual machine on production site won't be handled by KSYS. In that case, the admin must clean up virtual machines manually.

Key terms used in this article

Refer to the following table to understand the key terms used in this article.

TermDescription
Site 1 Is the production site, where the workloads are running at a specific time (for example, India)
Site 2 Is the backup site, which acts as a backup for the workload at a specific time (for example, Austin)
HMC 1_1 Is a HMC on Site 1
HMC 2_1 Is a HMC on Site 2
HOST 1_1 Is a managed system (Host/CPC) on Site 1
HOST 2_1 Is a managed system (Host/CPC) on Site 2
VIOS 1_1 Is a primary VIOS on Site 1
VIOS 1_2 Is a redundant VIOS on Site 1
VIOS 2_1 Is a VIOS on Site 2
VM1 Is a virtual machine
Central processor complex (CPC) Is a physical collection of hardware that consists of the main storage, one or more central processors, timers, and channels
CG Is a collection of base volumes in your storage array
KSYS Is a controlling system that provides a single point of control for the entire environment managed by the Geographically Dispersed Resiliency for Power Systems solution

Note: We are using EMC corporation's VMAX storage and Symmetrix Remote Data Facility (SRDF) replication to demonstrate this article.

2

Problem statement

As a best practice, multipathing with dual Virtual I/O Server (VIOS) configuration is often deployed in production environments for redundancy, better performance, and flexibility for the maintenance of the VIOS. In such a configuration, each VM has a virtual Fiber Channel (FC) adapter that is mapped to each VIOS. With multipath I/O, the VM can accesses the storage disks using two different paths, each provided by a separate VIOS. However, due to any resource limitations, deploying the dual VIOS configuration might not be always possible. In such scenarios, consider a case where the host at the primary site has a dual VIOS setup but the host at the backup site has a single VIOS setup as depicted in Figure 1.

Figure 1. Production site with dual VIOS and backup site with single VIOS

The environment, described in Figure 1, causes the following problems during DR operations:

Problem. During the verification phase of the Geographically Dispersed Resiliency solution, the KSYS manager verifies if the host at the active site and the backup site has the same configuration. In this environment, due to VIOS configuration mismatch, the DR operation fails.

However, if the VMs are started with a single VIOS configuration at the backup site using the lose_vios_redundancy attribute (explained later in this article) and later when the VMs are moved back to the active site, the dual VIOS configuration is lost at the primary site.

3

Geographically Dispersed Resiliency using lose_vios_redundancy option

The ksysmgr command provides the lose_vios_redundancy attribute to allow the VMs that have dual VIOS setup from the source site to recover the VMs with only a single VIOS instance at the backup site. By default, this attribute is set to no, which means that the dual VIOS setup is maintained during disaster recovery to the backup site.

The value of this attribute should be set to yes to allow DR operation when a backup site consists of a single VIOS instance as shown in Figure 1.

# ksysmgr modify system lose_vios_redundancy=yes

After completing the disaster recovery operation successfully from Site 1 to Site 2 with the lose_vios_redundancy option set to yes, at Site 2 VM1's paths will be mapped to the single VIOS 2_1 as shown in Figure 2.

Figure 2. Path mappings after DR from Site 1 to Site 2

After Site 1 is recovered, let us say the DR operation is initiated from Site 2 to Site 1. After successful DR operation, the VIOS configuration which is similar to the one at Site 2 is retained at Site 1. That is, VM1's paths will be mapped either through VIOS 1_1 or VIOS 1_2 as shown in Figure 3.

Figure 3. Possible path mappings after DR from Site 2 to Site 1

This shows that the initial configuration of Site 1 that is, multipathing with dual VIOS configuration functionality is lost, even though VM1 is moved back to Site 1. This is the problem discussed in this article. This might be a concern with production environments. Hence, with this article we provide a solution to retain the original configuration of multipathing with dual VIOS after DR operation.

4

Procedure to retain multipathing with dual VIOS configuration after DR

This section provides the high-level details of the solution summary to resolve the problem introduced in this article:

  • Prepare for DR operation using lose_vios_redundancy attribute to allow the VMs from dual VIOS setup from the source site to recover the VMs with only a single VIOS instance at the backup site.
  • Perform unplanned DR operation from Site 1 to Site 2. This helps to avoid automatic cleanup of dual VIOS configuration at Site 1.
  • Instead of DR operation from Site 2 to Site 1, resync active site consistency group, modify configuration parameters, reverse the EMC disk mirroring, and activate the VM1 profile on Site 1.

Note: It is not recommended to perform any configuration changes after a DR (that is, on Site 2). Because we are activating the VM profile which is saved on the production site (that is, Site 1).

VM with dual VIOS configuration setup at Site 1

Run the following commands to check the path information of VM1 (refer Figure 4).

      #hostname
      #uname -L
      #lscfg -vpl fcs0 | grep "Hardware location code"
      #lscfg -vpl fcs1 | grep "Hardware location code"
      #lspath | grep hdisk0
Figure 4. VM accessing SAN disks from dual VIOS

The output of the lsmap -all -npiv command on VIOS 1_1 and VIOS 1_2 shows that VM has the required mappings to have a path from each VIOS respectively (refer Figure 5 and Figure 6).

Figure 5. Virtual FC mapping on VIOS 1_1
Figure 6. Virtual FC mapping on VIOS 1_2

Prepare DR operation using the lose_vios_redundancy attribute with value yes

You need to perform the following steps to prepare for DR operation using the lose_vios_redundancy attribute with value yes:

  1. Create a cluster and add the KSYS node to it using the following command.
    # ksysmgr add ksyscluster <cluster name> ksysnodes=<node name>

    Add CPC/host, HMC, and storage agents to both sites. Pair both CPCs.

    For more learning about ksyscluster, refer Detailed steps for cluster creation.

    Figure 7. Cluster online on KSYS
  2. Initiate discovery on Site 1 (for example, India) using the following command:
    # ksysmgr -t discover site India
    Figure 8. Discovery on India site
  3. Modify the lose_vios_redundancy attribute to yes using the following command.
    # ksysmgr modify system lose_vios_redundancy=yes
    Figure 9. Setting the lose_vios_redundancy attribute to yes
  4. Perform verification checks at the site before initiating the DR using the following command.
    # ksysmgr -t verify site India
    Figure 10. Verification process on KSYS node
  5. Check the site details after DR using the following command.
    #ksysmgr q site
    Figure 11. Active and backup site details

    Use the following command to check the site ID.

    #lsrsrc IBM.VMR_SITE
    Figure 12. Site ID of both sites

    Use the following command to check the IBM.VMR_SITE class attribute (that is, ActiveSiteID).

    #lsrsrc -c IBM.VMR_SITE | grep ActiveSiteID
    Figure 13. Active site ID

Initiate an unplanned VM move from Site 1 to Site 2

The following steps needs to be performed to initiate an unplanned VM move from Site 1 to Site 2:

  1. Invoke an unplanned move from the active site to the backup site using the following command to avoid automatic cleanup of Site 1.
    #ksysmgr move site from=India to=Austin dr_type=unplanned
    Figure 14. Unplanned move initiated from Site 1 to Site 2
  2. Confirm if the DR operation is successful.
    1. Ensure that the VIOS configuration is retained at Site 1.

      After disaster recovery, VM1 on HMC 1_1 at Site 1 is in the Not Activated state. Profile information is not removed in an unplanned DR, and therefore, the VIOS configuration is preserved.

      Figure 15. VM state on HMC 1_1
    2. Verify if the DR operation is successful at Site 2.

      After a DR, VM1 on HMC 2_1 at Site 2 is in the Running state.

      Figure 16. VM state on HMC 2_1
    3. Run the following command to confirm that VM has a dual-path configuration after DR.
      #lspath | grep hdisk0
      Figure 17. Dual path disk on VM1 after DR
    4. Ensure the disk is getting multipath after the DR.

      On VIOS2_1, the output of the # lsmap -all -npiv command confirms that VM1 has all the paths from VIOS2_1.

      Figure 18. Virtual Fibre Channel adapter mapping on present active site (for example, Austin)
    5. Check the active site ID after DR.
      Figure 19. Active site ID changed to '2' after DR
      Figure 20. Active and backup sites after DR
    6. Run the following command to check the state of the consistency group after DR.
      # /usr/symcli/bin/symrdf -cg VMRDG_cluster1_India query -detail

      -cg refers to the consistency group name.

      Figure 21. Consistency group state right after DR
  3. After DR, the consistency group state will be Failed over. On the KSYS node, resync the active site consistency group (in this example, VMRDG_cluster1_Austin) to change the state to Consistent.

    Use the following command to resync the consistency group.

    # /opt/IBM/ksys/storages/EMC/resync_emc_srdf_cg -s 196800573 -e 
            <any string> -g VMRDG_cluster1_Austin -i 10.40.0.209 -t <any number>
            
          -s Active site storage id.
          -e Eyecatcher.
          -g CG_Name.
          -i IP address.
          -t Thread id.
    Figure 22. Resynchronizing the consistency group

    You can check the state of the consistency group after resync using the following command:

    # /usr/symcli/bin/symrdf -cg VMRDG_cluster1_India query -detail
    Figure 23. Consistency group state after resynchronizing the replication

Procedure to preserve dual-VIOS configuration of VM at Site 1

You need to perform the following steps to preserve the dual-VIOS configuration of the VM at Site 1.

  1. Deactivate VM1 on Site 2.
    Figure 24. State of VM1 on Site 2
  2. On the KSYS node, change the EMC replication from the current backup site (Site 2) to the production site (Site 1) using the following command:
    # /opt/IBM/ksys/storages/EMC/reverse_emc_srdf_cg -s 196800508 -e 5EoV6 -g
            VMRDG_cluster1_India -i 10.40.0.170 -t 892 -m UNPLAN
    Figure 25. Reverse replication of CG

    Use the following command to show the state of the consistency group after reversing the replication or mirroring on simulator.

    # /usr/symcli/bin/symrdf -cg VMRDG_cluster1_India query -detail
    Figure 26. Consistency group state after reversing the replication
  3. Modify the active site ID to 1. To initiate the DR without the ksysmgr move command, we can modify the ActiveSiteID to 1 using the following command and then reverse the mirroring path. This will retain the original configuration on Site 1 after DR again.
    # chrsrc -c IBM.VMR_SITE ActiveSiteID=1
    Figure 27. Changing the resource attribute ActiveSiteID to 1
  4. Activate the VM1 profile on Site 1.
    Figure 28. State of the VM1 on Site 1 from HMC 1_1 GUI
  5. Invoke cleanup on Site 2.

    The following command cleans up the VM1 configuration on Site 2.

    #ksysmgr cleanup site <site name>
    Figure 29. Cleanup process on Site 2
5

Conclusion

This article demonstrated the procedure to perform the disaster recovery operation with dual VIOS configuration on the production site and single VIOS configuration on the backup site. It also explained the procedure to preserve the original configuration of multipathing with dual VIOS on the production site.


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=1044753
ArticleTitle=Preserving redundant VIOS configuration on production site with IBM Geographically Dispersed Resiliency solution
publish-date=04122017