VM Recovery Manager HA limitations
Consider the following restrictions for the VM Recovery Manager HA solution.
KSYS limitations
- The KSYS subsystem follows the
KSYS_<peer domain>_<HG_ID>
format to name an HA SSP. The KSYS subsystem uses this format to differentiate between an HA SSP and a user-defined SSP. Therefore, you must not use this format for all user-defined SSPs. - The following commands can be run without considering any of the
policies:
Therefore, successful completion of the verification and validation operations does not mean that the virtual machines can be relocated successfully.ksysmgr verify host_group <host_group_name> ksysmgr lpm host <host_name> action=validation
- When you remove the KSYS cluster, the KSYS subsystem fails to delete HA-specific VM
and VIOS adapters if the cleanup operation continues for a long time. You must delete the VIOS
adapters manually to avoid inconsistencies across the Virtual I/O Servers. If you create the KSYS
cluster again, the KSYS subsystem can reuse the previous HA-specific adapters.
- Workaround:
- To remove the KSYS cluster, run the following command with the -f option:
ksysmgr -f remove ksyscluster <ksyscluster_name>
- To remove the Shared Storage Pool (SSP) cluster, run the following command in one of the Virtual
I/O Server (VIOS) of the SSP cluster:
cluster -remove
- To check if the hsmon daemon is stopped in all Virtual I/O Servers (VIOS),
run the following command:
lssrc -s ksys_hsmon
- To stop the hsmon daemon in the VIOS, run the following command in all
Virtual I/O Servers:
stopsrc -s ksys_hsmon -c
- To remove the KSYS cluster, run the following command with the -f option:
- Workaround:
- The KSYS subsystem supports the Shared Storage Pool (SSP) cluster's high availability disk only when the Shared Storage Pool (SSP) is created from the KSYS subsystem. The KSYS subsystem does not display the high availability disk in any query when you use a user-defined SSP cluster.
- You cannot modify a KSYS subsystem's high availability disk after creating the SSP cluster from the KSYS node. If you want to modify the HA disk, you must delete the host group and re-create the host group with the HA disk details.
After configuring applications on VMRM environment, if you shutdown a virtual machine (VM), the KSYS subsystem does not change the status of the VM to red, the status remains green. However, if you shutdown the same VM through the HMC, the KSYS subsystem changes the status of the application to red.
- A maximum of 10 scripts can be added in the KSYS subsystem for
add notify
command. - On VIOS nodes, if the disks of a shared storage pool (SSP) are not accessible
after the system is re-activated due to shutdown or reboot, the disk state continues to be
down. This impacts the start of pool and requires a quorum to come back
online. As a workaround, choose one of the following options. If you do not want to reboot your
VIOS, follow the workaround option 1.
- Workaround option 1: Complete the following procedure:
- Restore the disk connectivity.
- Run the
cfgmgr
command as a root user to make the system aware of the disks. - Run the command
padmin: clstartstop -stop -m <node>
. - Run the command
padmin: clstartstop -start -m <node>
.
- Workaround option 2: Complete the following procedure:
- Restore the disk connectivity.
- Reboot the VIOS node.
- Workaround option 1: Complete the following procedure:
- For a VM with vSCSI disk, the cleanup operation fails in the local database
mode.
- Workaround: You must bring the SSP cluster back to the global mode.
- The KSYS subsystem does not handle the application dependency, if the VM has been shut down manually and the dependent application is part of the VM.
- VM Recovery Manager HA does not work if the Live Partition Mobility (LPM) feature is disabled at firmware level.
- If a current repository disk is down, automatic replacement does not occur on
previously used repository disk that has the same cluster signature. In this case, a free backup
repository disk might not be available, hence the automatic replacement operation fails.
- Workaround: Run the following command to clear the previous cluster signatures:
cleandisk -r <diskname>
- Workaround: Run the following command to clear the previous cluster signatures:
- In the scalability environment where the VMs are spread across the hosts of a host
group, and the LPM verification operation is run on the host group, based on the type of
configuration, at some point of time, many requests might go to one host and if the number of
requests are more than the maximum requests that the host can handle, the verification operation
might fail with following error:
HSCLB401 The maximum number of partition migration commands allowed are already in progress.
- In the KSYS LPAR, if you upgrade the AIX operating system after upgrading the KSYS
software, a few class IDs might be missing in the
/usr/sbin/rsct/cfg/ct_class_ids
file and the KSYS daemon might stop working.- Workaround: Run the following command to check whether the class IDs are
reserved.
If any of the class IDs that are displayed in the preceding screen are missing in your output, add the missing entries in thecat /usr/sbin/rsct/cfg/ct_class_ids IBM.VMR_HMC 510 IBM.VMR_CEC 511 IBM.VMR_LPAR 512 IBM.VMR_VIOS 513 IBM.VMR_SSP 514 IBM.VMR_SITE 515 IBM.VMR_SA 516 IBM.VMR_DP 517 IBM.VMR_DG 518 IBM.VMR_KNODE 519 IBM.VMR_KCLUSTER 520 IBM.VMR_HG 521 IBM.VMR_APP 522 IBM.VMR_CLOUD 523 IBM.VMR_DP_CLD 524 IBM.VMR_SA_CLD 525 IBM.VMR_LPAR_CLD 526 IBM.VMR_SITE_CLD 527 IBM.VMR_VMG_CLD 528 IBM.VMR_APP_CLD 529
/usr/sbin/rsct/cfg/ct_class_ids
file to restart the VMR services.
- Workaround: Run the following command to check whether the class IDs are
reserved.
For a virtual machine on which the ha_monitor attribute is enabled by using the KSYS subsystem and you have shut down the virtual machine by using the immediate option from the HMC, the KSYS subsystem does not display the error message when you run the discover or verify operation.
For a multi-node KSYS cluster, the VM Recovery Manager HA Version 1.7 supports only two KSYS nodes.
For addMS/addVM related issue, ensure that the ksys_hsmon daemon is not running on any VIOS before the first discovery operation. If you do not clean the previous cluster properly, the ksys_hsmon daemon remains in an active state. When you create a cluster where the ksys_hsmon is running, the addMS/addVM related issue occurs in the KSYS subsystem.
For a multi-node KSYS setup, while adding clusters, it is recommended to have the operating system version and the RSCT version to be same on all nodes. If the operating system that is running on a node is on an earlier version than the other node, you must specify the node that is running the earlier version of the operating system before the node that is running the later version of the operating system when you run a command to add nodes to the cluster.
KSYS LPM Limitations
- You cannot run the Live Partition Mobility (LPM) operation simultaneously on multiple hosts by using the ksysmgr command. You must specify multiple virtual machines, in a comma-separated list, in the ksysmgr command. Also, you can perform the LPM operation on a list of virtual machines simultaneously only if all the virtual machines are present in the same host.
- The flexible capacity policy is applicable only for VM failover operations. The flexible capacity function is not supported for virtual machines that are migrated by using the LPM operation.
- The flexible capacity policy is applicable only on CPU and memory resources. It is not applied on I/O resources. You must ensure enough I/O resources are available in the target host.
- If a VM migrates from host1 to host2, and applications in the VM become stable.
At a later point of time, if the VM from the host2 needs to be migrated due to an application
failure, the host1 will not be considered as a backup for application failure migration, because the
VM had previously failed on host1. If host1 needs to be considered as a backup for future
application failure, use the following workaround.
- Workaround: After the VM is stable on the host2, clear the FailedHostList
list of the VM. Run the command
chrsrc -s 'Name="VMName"' IBM.VMR_LPAR VmRestartFailedCecs='{""}'
to clear the FailedHostList list for the VM.
- Workaround: After the VM is stable on the host2, clear the FailedHostList
list of the VM. Run the command
- The discovery operation or the KSYS restart operation automatically starts the
dependency applications that were stopped by the user before the discovery or the restart of the
KSYS subsystem.
- Workaround: Complete the following procedure:
- Do not perform the discovery operation after stopping the dependency application.
- Disable the auto discover and the quick discovery features.
- Do not perform the KSYS subsystem restart.
- Workaround: Complete the following procedure:
VM agent limitations
- The ksysvmmgr start|stop app command supports only one application at a time.
- The
ksysvmmgr suspend|resume
command is not supported for the applications that are configured in an application dependency setup. - For all applications that are installed on the non-rootvg disks, you must enable the automatic varyon option for volume groups and the auto mount option for file systems after the virtual machine is restarted on the AIX® operating system.
- If the application is in any of the failure states, for example,
NOT_STOPPABLE
,NOT_STARTABLE
,ABNORMAL
, orFAILURE
, you must fix the failure issue, and then use the ksysvmmgr start|resume application command to start and monitor the application. - If the KSYS cluster is deleted, or if a virtual machine is not included for the HA management, the VM agent daemon becomes inoperative. You must manually re-start the VM agent daemon in the virtual machine to bring the VM agent daemon to operative state.
- For the VMs running on the Linux VM agent, the restart operation might take longer
time than expected, and the rediscovery operation might fail and display the following message:
Rediscovery has encountered error for VM VM_Name
- Workaround: Run the discovery operation after the virtual machine is in the active state.
The state of application that is running on a VM does not change automatically to normal when the VM is recovered by the KSYS subsystem. The state of application changes to normal when you run the
resume
command from the VM.- Workaround: To resume the application, run the following command:
An output that is similar to the following example is displayed:ksysvmmgr -s resume app <app name>
Modifying application "App1" into daemon configuration successfully performed.
- Workaround: To resume the application, run the following command:
GUI limitations
- The VM Recovery Manager HA GUI does not support multiple sessions that are originating from the same computer.
- The VM Recovery Manager HA GUI does not support duplicate names for host group, HMC, host, VIOS, and VMs. If a duplicate name exists in the KSYS configuration, the GUI might have issues during host group creation or in displaying the dashboard data.
- The VM Recovery Manager HA GUI refreshes automatically after each topology change (for example, VM migration operation and host migration operation). After the refresh operation is complete, the default KSYS dashboard is displayed. You must expand the topology to view the log information in the Activity window for a specific entity.
- Any operation performed by a user from the command-line interface of VM Recovery Manager HA is not displayed in the activity window of the VM Recovery Manager HA GUI.
Miscellaneous
- The VM Recovery Manager HA solution does not support internet Small Computer Systems Interface (iSCSI) disk type. Only N_Port ID virtualization (NPIV) and virtual Small Computer System Interface (vSCSI) disk types are supported.
- In a user-defined SSP cluster, if you want to add a host or VIOS to the environment, you must add it in the shared storage pool (SSP) cluster first. Then, you can add the host or VIOS to the KSYS cluster. Also, if you want to remove a host or VIOS from the environment, you must first remove it from the KSYS cluster and then remove it from the SSP cluster.
- VM Recovery Manager HA supports only detailed-type snapshot.
- After each manage VIOS operation and unmanage VIOS operation, you must perform the discovery operation.
If you have configured an application as critical on an virtual machine, ensure that the KDB option for the virtual machine is disabled.
Errors that the KSYS subsystem cannot handle
The KSYS subsystem automatically restarts the VMs only when the KSYS subsystem is certain of the failures. If the KSYS subsystem is unsure, it sends an alert message to the administrator to review the issue and to manually restart VMs, if required.
Sometimes, the KSYS subsystem cannot identify whether the host failure is real or the host
failure is because of a partitioning network. The KSYS subsystem does not automatically restart VMs
in the following example scenarios:
- When the KSYS subsystem cannot connect to the HMC to quiesce the failed VM (fencing operation) on the source host before restarting the VM on the target host. The fencing operation is required to ensure that the VM is not running on two hosts simultaneously.
- The host monitor module and the VIOS can monitor their own network and storage. Sometimes, network and storage errors are reported by the VIOS and these error events are notified to the administrator through email and text messages. In these cases, the KSYS subsystem does not move the VMs automatically to avoid false relocation.
- When a host group is spread across two buildings with storage subsystem technologies such as IBM® SAN Volume Controller (SVC) HyperSwap®, where HMCs, hosts and other required resources exist in each building
and the KSYS LPAR is deployed on the backup building, the following scenarios cannot be
automatically handled:
- Power failure in the main building: The KSYS subsystem cannot connect to the HMCs and hosts in the main site. The KSYS subsystem detects the host failure and notifies the administrator.
- Issues in network and storage partitioning between the buildings: The KSYS subsystem cannot connect to the HMCs, and therefore notifies the administrator about the host failure. The administrator must review the environment and decide whether to move the VMs. The VMs might be operating correctly on the main host. The administrator can rectify the network links between the hosts and the KSYS subsystem will start operating in normal mode.