A simplified approach to hardware upgrade using IBM Geographically Dispersed Resiliency for Power Systems
Concepts and step-by-step walkthrough on how to keep application workloads running when you need to plan for downtime of hosts for hardware upgrade
What is IBM Geographically Dispersed Resiliency for Power Systems?
IBM® Geographically Dispersed Resiliency for Power Systems™ solution is a disaster recovery (DR) solution that is easy to deploy and provides an automated process to recover virtual machines (VMs) at the remote or failover site during a disaster. Because disaster recovery of applications and services is a key component to provide continuity for business, the IBM Geographically Dispersed Resiliency solution helps customers to have an automated disaster recovery process during a failure. Disaster recovery solutions are mainly based on cluster-based technology and virtual machine restart based technology. This solution provides an easy deployment model that uses a controller system (called KSYS) to monitor the entire virtual machine environment. This solution also provides flexible failover policies and storage replication management.
You can learn more about Geographically Dispersed Resiliency for Power Systems at the IBM developerWorks® wiki documents: Why GDR is the ideal DR solution for Power Systems and FAQ.
Every few years, hardware systems go through an upgrade cycle to meet growing business needs such as more customers, increased traffic, consolidation with high capacity new servers, and so on. This article explains the advantages of using Geographically Dispersed Resiliency for Power Systems for planned hardware upgrades followed by a detailed walk-through of the steps for performing the upgrade. Note that the scope of hardware upgrade discussed in this article is for host central processor complex (CPCs), and not for upgrades to storage on storage area network (SAN).
Before we dive into the technique, advantages, and steps, we will recap the key terminologies used in IBM Geographically Dispersed Resiliency for Power Systems. You can skip this section if you are familiar with these from our documentation and previous articles.
- KSYS: KSYS is the logical partition (LPAR), currently an IBM AIX®
LPAR, where Geographically Dispersed Resiliency software is deployed. KSYS acts as the
orchestrator that monitors, manages, and moves VMs from one site to another. KSYS stands
for the C(K)ontroller System LPAR. KSYS is configured using the
ksysmgrcommand in the following format:
ksysmgr ACTION CLASS [NAME] [ATTRIBUTES...]
- Site: This is a logical name that represents the primary or active site
and the disaster recovery or backup site. Sites must be created at the KSYS level. All the
Hardware Management Consoles (HMCs), hosts, Virtual I/O Server (VIOS) instances, and
storage devices are mapped to one of the sites. Sites can be of the following two types:
- Active site (or primary site): This refers to the current site where the workloads are running at a specific time.
- Backup site (or disaster recovery site): This refers to the site that acts as a backup for the workload at a specific time. During a disaster or a potential disaster, workloads are moved to the backup site.
- Host: A host is a managed system in HMC that is primarily used to run workloads. Hosts are identified by its Universally Unique Identifier (UUID) as tracked in the HMC. A host pair indicates a set of hosts that are paired across the sites for high availability and disaster recovery.
- Virtual machines: Virtual machines, also known as logical partition, are associated with specific VIOS partitions for a virtualized storage to run a workload. A host can contain multiple virtual machines.
- Storage agents: A disaster recovery solution requires an organized storage management because storage is a vital entity in any data center. The GDR solution relies on data replication from the active site to the backup site. In the GDR solution, the data is replicated from the active site to the backup site by using storage replication.
- Discovery of site: After the initial configuration is complete, the KSYS node discovers all the hosts that are managed by the HMCs in both the active and the backup sites and displays the status. During discovery, the KSYS node monitors the discovery of all LPARs or VMs in all the managed hosts within the selected site. The KSYS node collects the configuration information for each LPAR, and displays the status. The KSYS node discovers the disks of each VM and checks whether the VMs are configured currently for the mirroring of storage devices.
- Verification of site: In the verification phase, the KSYS node fetches information from the HMC to check whether the backup site is capable to host the VMs during a disaster. The KSYS node also verifies storage replication-related details.
- Disaster recovery: After the verification phase, the KSYS node keeps
monitoring the active site for any failures or issues in any of the resources in the site.
When any planned or unplanned outages occur, and if the situation requires disaster
recovery, you must manually initiate the recovery by using the
ksysmgrcommand to move the virtual machines to the backup site.
- Planned DR: A planned move is an operation in which an administrator initiates a move when there is no disaster event and the resources in the active site can be shut down gracefully. These types of operations are initiated mainly to perform a DR test drill, move from one site to another, or when one of the sites needs to be taken offline for maintenance.
- Unplanned DR: In an unplanned DR scenario, a disaster such as power failure has brought down the active site and it can no longer be reached from the backup site. In such a situation, the VMs need to be started on the backup site and software stack brought back online to resume the business applications. Because a disaster has struck down the active site, the resources in the active site are no longer reachable and cannot be automatically released back into the enterprise pool by Geographically Dispersed Resiliency (KSYS). After the active site is up, the administrator can use KSYS to manually initiate a cleanup of the VMs of the active site.
Typical planned hardware upgrade
It is important to shed light on how time consuming and cumbersome planned hardware upgrade cycles are to bring out the value preposition of doing the same with Geographically Dispersed Resiliency. There are multiple approaches available, but it typically starts with the option of using Live Partition Mobility (LPM). However, if disaster strikes the production site in middle of LPM then the virtual machines will not be recovered with Geographically Dispersed Resiliency, because Geographically Dispersed Resiliency needs a paired host for the VMs to be recoverable on the backup site. Because LPM limits to a certain distance, network, and storage configuration, in case the distance between the production site and the backup site is longer and the VMs need to shut down for planned maintenance activities of the data center or for entire data center replacement, you can use Geographically Dispersed Resiliency to move the VMs to the backup site and move them back to the production site when maintenance is complete.
Using Geographically Dispersed Resiliency for planned hardware upgrade
If your environment is Geographically Dispersed Resiliency enabled to reap the benefits of GDR as covered in our previous articles such as Introduction to GDR and Business Continuity and Disaster Recovery as a Service (DRaaS) offering using IBM Geographically Dispersed Resiliency Solution for Power Systems your hardware upgrade will be considerably simplified as well. The technique is simple and includes the following steps:
- Invoke a planned DR from the production site to the backup site.
- Unpair the primary and backup sites.
- Replace/Upgrade the production site host.
- Pair the backup site with the new production site host.
- Invoke a planned DR from the backup site to the production site.
Figure 1. Steps to perform planned hardware upgrade
Walkthrough of steps
Let's take a typical example that we want to replace the host CPC/managed system from
IBM POWER7® to IBM POWER8®. Ensure that the
ksyscluster is active
before invoking a planned disaster recovery and daemon used by
- Run the following command used to check the state of
ksysmgr query cluster
Figure 2. ksysmgr query cluster output
- Run the following command to check the IBM.VMR daemon used by KSYS:
lssrc -s "IBM.VMR"
Figure 3. command to check status of IBM.VMR
- Check the current site and host configuration using the following command:
ksysmgr query site
Figure 4. ksysmgr query site output
Figure 5. ksysmgr query host output
- Invoke a planned disaster recovery from the active site to the backup site. In this
article, we invoke a planned disaster recovery from Austin to India.
Run the following command:
ksysmgr -t move site from=<active site> to=<backup site> dr_type=planned
Figure 6. Command to invoke DR
- Run the following command to unpair the existing host pair:
ksysmgr pair host <active site hostname> pair=none
Figure 7. Command to pair two hosts using ksysmgr
- Run the following command to check if the hosts are unpaired:
ksysmgr query host
Figure 8. ksysmgr query host output
- Observe that the Pair field shows None after unpairing.
- Add the new host/managed system to the upgraded level/version using the following
ksysmgr add host hostname site=<site name>
Figure 9. Command to add host to KSYS configuration
- Remove the older level/version of the host/managed system using the following command:
ksysmgr remove host hostname
Figure 10. Command to remove host from KSYS configuration
- Pair the newly added host/managed system with the existing host/managed system and then
ensure that pairing is done from the current active site host to the backup site host.
Figure 11. Command to query site
- Run the following command to pair the active site host with the backup site host:
ksysmgr pair host <active site hostname > pair=<backup site hostname>
Figure 12. Command to pair hosts using ksysmgr
- Check the host configuration to ensure that both active and backup sites are paired.
Figure 13. Command to query host/managed system in KSYS configuration
- Invoke planned disaster recovery from the current backup site to the production site.
Discover the current active site, verify the site, and invoke a planned disaster recovery.
- Note that, when a new host is paired, KSYS is designed to discover all VMs in the
host by default. If you have additional VMs on the newly paired host or the current
host that were not managed by KSYS before, you'll need to unmanage them using the
ksysmgr unamange vm name=<VM name> host=<hostname>
- Use the following command to discover the current active site:
ksysmgr -t discover site <active site name>
Figure 14. Command to discover active site using ksysmgr
- Use the following command to verify the current active site:
ksysmgr -t verify site <active site name>
Figure 15. Command to verify the active site using ksysmgr
- Use the following command to invoke a planned DR:
ksysmgr -t move site from=<active site> to=<backup site> dr_type=planned
Figure 16. Command to invoke DR using ksysmgr
- Note that, when a new host is paired, KSYS is designed to discover all VMs in the host by default. If you have additional VMs on the newly paired host or the current host that were not managed by KSYS before, you'll need to unmanage them using the
Virtual machines are now running on the original production site which has an upgraded managed system/CPC.
In addition to the core value preposition of high availability, Geographically Dispersed Resiliency facilitates a simplified approach to perform a planned hardware upgrade with considerably less downtime.
- IBM Geographically Dispersed Resiliency for IBM Power Systems
- FAQs about IBM Geographically Dispersed Resiliency for IBM Power Systems
- Why Geographically Dispersed Resiliency is an ideal solution for IBM Power Systems