A simplified approach to hardware upgrade using IBM Geographically Dispersed Resiliency for Power Systems

Concepts and step-by-step walkthrough on how to keep application workloads running when you need to plan for downtime of hosts for hardware upgrade


What is IBM Geographically Dispersed Resiliency for Power Systems?

IBM® Geographically Dispersed Resiliency for Power Systems™ solution is a disaster recovery (DR) solution that is easy to deploy and provides an automated process to recover virtual machines (VMs) at the remote or failover site during a disaster. Because disaster recovery of applications and services is a key component to provide continuity for business, the IBM Geographically Dispersed Resiliency solution helps customers to have an automated disaster recovery process during a failure. Disaster recovery solutions are mainly based on cluster-based technology and virtual machine restart based technology. This solution provides an easy deployment model that uses a controller system (called KSYS) to monitor the entire virtual machine environment. This solution also provides flexible failover policies and storage replication management.

You can learn more about Geographically Dispersed Resiliency for Power Systems at the IBM developerWorks® wiki documents: Why GDR is the ideal DR solution for Power Systems and FAQ.

Every few years, hardware systems go through an upgrade cycle to meet growing business needs such as more customers, increased traffic, consolidation with high capacity new servers, and so on. This article explains the advantages of using Geographically Dispersed Resiliency for Power Systems for planned hardware upgrades followed by a detailed walk-through of the steps for performing the upgrade. Note that the scope of hardware upgrade discussed in this article is for host central processor complex (CPCs), and not for upgrades to storage on storage area network (SAN).

Before we dive into the technique, advantages, and steps, we will recap the key terminologies used in IBM Geographically Dispersed Resiliency for Power Systems. You can skip this section if you are familiar with these from our documentation and previous articles.

  • KSYS: KSYS is the logical partition (LPAR), currently an IBM AIX® LPAR, where Geographically Dispersed Resiliency software is deployed. KSYS acts as the orchestrator that monitors, manages, and moves VMs from one site to another. KSYS stands for the C(K)ontroller System LPAR. KSYS is configured using the ksysmgr command in the following format: ksysmgr ACTION CLASS [NAME] [ATTRIBUTES...]
  • Site: This is a logical name that represents the primary or active site and the disaster recovery or backup site. Sites must be created at the KSYS level. All the Hardware Management Consoles (HMCs), hosts, Virtual I/O Server (VIOS) instances, and storage devices are mapped to one of the sites. Sites can be of the following two types:
    • Active site (or primary site): This refers to the current site where the workloads are running at a specific time.
    • Backup site (or disaster recovery site): This refers to the site that acts as a backup for the workload at a specific time. During a disaster or a potential disaster, workloads are moved to the backup site.
  • Host: A host is a managed system in HMC that is primarily used to run workloads. Hosts are identified by its Universally Unique Identifier (UUID) as tracked in the HMC. A host pair indicates a set of hosts that are paired across the sites for high availability and disaster recovery.
  • Virtual machines: Virtual machines, also known as logical partition, are associated with specific VIOS partitions for a virtualized storage to run a workload. A host can contain multiple virtual machines.
  • Storage agents: A disaster recovery solution requires an organized storage management because storage is a vital entity in any data center. The GDR solution relies on data replication from the active site to the backup site. In the GDR solution, the data is replicated from the active site to the backup site by using storage replication.
  • Discovery of site: After the initial configuration is complete, the KSYS node discovers all the hosts that are managed by the HMCs in both the active and the backup sites and displays the status. During discovery, the KSYS node monitors the discovery of all LPARs or VMs in all the managed hosts within the selected site. The KSYS node collects the configuration information for each LPAR, and displays the status. The KSYS node discovers the disks of each VM and checks whether the VMs are configured currently for the mirroring of storage devices.
  • Verification of site: In the verification phase, the KSYS node fetches information from the HMC to check whether the backup site is capable to host the VMs during a disaster. The KSYS node also verifies storage replication-related details.
  • Disaster recovery: After the verification phase, the KSYS node keeps monitoring the active site for any failures or issues in any of the resources in the site. When any planned or unplanned outages occur, and if the situation requires disaster recovery, you must manually initiate the recovery by using the ksysmgr command to move the virtual machines to the backup site.
  • Planned DR: A planned move is an operation in which an administrator initiates a move when there is no disaster event and the resources in the active site can be shut down gracefully. These types of operations are initiated mainly to perform a DR test drill, move from one site to another, or when one of the sites needs to be taken offline for maintenance.
  • Unplanned DR: In an unplanned DR scenario, a disaster such as power failure has brought down the active site and it can no longer be reached from the backup site. In such a situation, the VMs need to be started on the backup site and software stack brought back online to resume the business applications. Because a disaster has struck down the active site, the resources in the active site are no longer reachable and cannot be automatically released back into the enterprise pool by Geographically Dispersed Resiliency (KSYS). After the active site is up, the administrator can use KSYS to manually initiate a cleanup of the VMs of the active site.

Typical planned hardware upgrade

It is important to shed light on how time consuming and cumbersome planned hardware upgrade cycles are to bring out the value preposition of doing the same with Geographically Dispersed Resiliency. There are multiple approaches available, but it typically starts with the option of using Live Partition Mobility (LPM). However, if disaster strikes the production site in middle of LPM then the virtual machines will not be recovered with Geographically Dispersed Resiliency, because Geographically Dispersed Resiliency needs a paired host for the VMs to be recoverable on the backup site. Because LPM limits to a certain distance, network, and storage configuration, in case the distance between the production site and the backup site is longer and the VMs need to shut down for planned maintenance activities of the data center or for entire data center replacement, you can use Geographically Dispersed Resiliency to move the VMs to the backup site and move them back to the production site when maintenance is complete.

Using Geographically Dispersed Resiliency for planned hardware upgrade

If your environment is Geographically Dispersed Resiliency enabled to reap the benefits of GDR as covered in our previous articles such as Introduction to GDR and Business Continuity and Disaster Recovery as a Service (DRaaS) offering using IBM Geographically Dispersed Resiliency Solution for Power Systems your hardware upgrade will be considerably simplified as well. The technique is simple and includes the following steps:

  1. Invoke a planned DR from the production site to the backup site.
  2. Unpair the primary and backup sites.
  3. Replace/Upgrade the production site host.
  4. Pair the backup site with the new production site host.
  5. Invoke a planned DR from the backup site to the production site.
Figure 1. Steps to perform planned hardware upgrade

Walkthrough of steps

Let's take a typical example that we want to replace the host CPC/managed system from IBM POWER7® to IBM POWER8®. Ensure that the ksyscluster is active before invoking a planned disaster recovery and daemon used by ksys is active.

  1. Run the following command used to check the state of ksyscluster:
    ksysmgr query cluster
    Figure 2. ksysmgr query cluster output
  2. Run the following command to check the IBM.VMR daemon used by KSYS:
    lssrc -s "IBM.VMR"
    Figure 3. command to check status of IBM.VMR
  3. Check the current site and host configuration using the following command:
    ksysmgr query site
    Figure 4. ksysmgr query site output
    Figure 5. ksysmgr query host output
  4. Invoke a planned disaster recovery from the active site to the backup site. In this article, we invoke a planned disaster recovery from Austin to India.

    Run the following command:

    ksysmgr -t move site from=<active site> to=<backup site> dr_type=planned
    Figure 6. Command to invoke DR
  5. Run the following command to unpair the existing host pair:
    ksysmgr pair host <active site hostname> pair=none
    Figure 7. Command to pair two hosts using ksysmgr
  6. Run the following command to check if the hosts are unpaired:
    ksysmgr query host
    Figure 8. ksysmgr query host output
  7. Observe that the Pair field shows None after unpairing.
  8. Add the new host/managed system to the upgraded level/version using the following command:
    ksysmgr add host hostname site=<site name>
    Figure 9. Command to add host to KSYS configuration
  9. Remove the older level/version of the host/managed system using the following command:
    ksysmgr remove host hostname
    Figure 10. Command to remove host from KSYS configuration
  10. Pair the newly added host/managed system with the existing host/managed system and then ensure that pairing is done from the current active site host to the backup site host.
    Figure 11. Command to query site
  11. Run the following command to pair the active site host with the backup site host:
    ksysmgr pair host <active site hostname > pair=<backup site hostname>
    Figure 12. Command to pair hosts using ksysmgr
  12. Check the host configuration to ensure that both active and backup sites are paired.
    Figure 13. Command to query host/managed system in KSYS configuration
  13. Invoke planned disaster recovery from the current backup site to the production site. Discover the current active site, verify the site, and invoke a planned disaster recovery.
    1. Note that, when a new host is paired, KSYS is designed to discover all VMs in the host by default. If you have additional VMs on the newly paired host or the current host that were not managed by KSYS before, you'll need to unmanage them using the ksys command:
      ksysmgr unamange vm name=<VM name> host=<hostname>
    2. Use the following command to discover the current active site:
      ksysmgr -t discover site <active site name>
      Figure 14. Command to discover active site using ksysmgr
    3. Use the following command to verify the current active site:
      ksysmgr -t verify site <active site name>
      Figure 15. Command to verify the active site using ksysmgr
    4. Use the following command to invoke a planned DR:
      ksysmgr -t move site from=<active site> to=<backup site> dr_type=planned
      Figure 16. Command to invoke DR using ksysmgr

Virtual machines are now running on the original production site which has an upgraded managed system/CPC.


In addition to the core value preposition of high availability, Geographically Dispersed Resiliency facilitates a simplified approach to perform a planned hardware upgrade with considerably less downtime.


Downloadable resources


Sign in or register to add and subscribe to comments.

Zone=AIX and UNIX
ArticleTitle=A simplified approach to hardware upgrade using IBM Geographically Dispersed Resiliency for Power Systems