Setting up the KSYS subsystem
You can use the ksysmgr command or the VM Recovery Manager HA GUI to interact with the KSYS daemon to manage the entire environment for high availability.
The VM Recovery Manager HA solution monitors the hosts and the virtual machines when you add information about your environment to the KSYS configuration settings. Complete the following steps to set up the KSYS subsystem:
- Step 1: Initialize the KSYS cluster.
- Step 2: Add HMCs.
- Step 3: Add hosts.
- Step 4: Create host groups.
- Optional: Configure virtual machines.
- Optional: Configure VIOS.
- Step 5: Setting contacts for event notification.
- Step 6: Enabling HA monitoring.
- Step 7: Discovering and verifying the KSYS configuration.
- Optional: Backing up the configuration data.
Step 1: Initialize the KSYS cluster
The KSYS environment relies on Reliable Scalable Cluster Technology (RSCT) to create its cluster on the KSYS logical partition (LPAR). After you create the KSYS cluster, various daemons of RSCT and KSYS are activated. The KSYS node can then process the commands that you specify in the command line.
- Configure a cluster and add the KSYS node to the cluster by running the
following
command:
ksysmgr add ksyscluster cluster_name ksysnodes=ksys_nodename type=HA
- Verify the KSYS cluster configuration by running the following
command:
ksysmgr verify ksyscluster cluster_name
- Deploy the one-node KSYS cluster by running the following
command.
ksysmgr sync ksyscluster cluster_name
Note: You can perform steps 1-3 by running the following command:
This command creates a cluster, adds the KSYS node to the cluster, verifies the cluster configuration, and deploys the one-node KSYS cluster.ksysmgr add ksyscluster cluster_name ksysnodes=ksys_nodename sync=yes type=HA
- Optional: Verify the KSYS cluster that you created by running the following
command:
An output that is similar to the following example is displayed:ksysmgr query ksyscluster
Name: ksys_test State: Online Type: HA Ksysnodes: ksys_nodename:1:Online KsysState: ksys_nodename:1:Online
Note: These commands do not display any output until you run theksysmgr sync
command.
Creating and initializing a multi-node KSYS cluster
- To create a cluster and to add multiple KSYS nodes to the KSYS cluster, run the following
command:
This command creates a cluster, adds the KSYS nodes that you specified to the cluster, verifies the cluster configuration, and deploys the multi-node KSYS cluster.ksysmgr add ksyscluster cluster_name ksysnodes=ksys_nodename1,ksys_nodename2 type=HA
Note:- For best performance, the RSCT version on all KSYS nodes must be the same. If the RSCT version is not the same on all KSYS nodes, it's recommended to choose the node that has the lowest RSCT version as the group leader node and then create the KSYS cluster on the group leader node.
- This release supports only a one-node or two-node KSYS cluster
- To use a multi-node KSYS cluster, VM Recovery Manager HA Version 1.7, or later must be installed on all KSYS nodes.
- To verify the KSYS cluster configuration, run the following command:
ksysmgr verify ksyscluster cluster_name
- To deploy the multi-node KSYS cluster, run the following command:
ksysmgr sync ksyscluster cluster_name
Note: You can run the following single command to perform Step 1 to Step 3:
This command creates a cluster, adds multiple KSYS nodes that you specified in the command to the cluster, verifies the cluster configuration, and deploys the multi-node KSYS cluster.ksysmgr add ksyscluster cluster_name ksysnodes=ksys_nodename1,ksys_nodename2 sync=yes type=HA
- Optional: To verify the KSYS cluster that you created, run the following command:
An output that is similar to the following example is displayed:ksysmgr query ksyscluster
Name: test_ksys State: Online Type: HA Ksysnodes: ksys_nodename1:1:Online(Managing node) ksys_nodename2:2:Online KsysState: ksys_nodename1:1:Online ksys_nodename2:2:Online
Note: These commands do not display any output until you run theksysmgr sync
command.
Modifying a multi-node KSYS cluster
- To remove a KSYS node from the KSYS cluster, run the following command:
ksysmgr modify ksyscluster cluster_name remove ksysnodes=ksys_nodename2
Note: For best performance, the RSCT version on all KSYS nodes must be the same. If the RSCT version is not the same on all KSYS nodes, it's recommended to choose the node that has the lowest RSCT version as the group leader node and then create the KSYS cluster on the group leader node.You can verify the KSYS cluster that you modified by running theksysmgr query ksyscluster
command. An output that is similar to the following example is displayed:Name: test_ksys State: Online Type: HA Ksysnodes: ksys_nodename1:1:Online KsysState: ksys_nodename1:1:Online
- To add a KSYS node to the KSYS cluster, run the following command on the group leader node:
You can verify the KSYS cluster that you modified by running theksysmgr modify ksyscluster cluster_name add ksysnodes=ksys_nodename2
ksysmgr query ksyscluster
commands. An output that is similar to the following example is displayed:Name: test_ksys State: Online Type: HA Ksysnodes: ksys_nodename1:1:Online(Managing node) ksys_nodename2:2:Online KsysState: ksys_nodename1:1:Online ksys_nodename2:2:Online
Step 2: Add HMCs
- The HMC user, whose user name and password details are provided to the KSYS, must have at least
hmcsuperadmin
privileges and remote access. The KSYS subsystem uses the Representational State Transfer (REST) API to communicate with the HMCs in the environment. Therefore, ensure that your environment allows HTTPS communication between the KSYS and HMC subsystems. - Ensure that port 12443 on HMC is excluded from the firewall so that the KSYS subsystem can communicate with the HMC using the HMC REST API.
- Add the HMC with user name:
hscroot
and password:xyz123
by running the following command:
For example, to add an HMC with user nameksysmgr add hmc hmcname login=username password=password hostname|ip=hostname|ip
hscroot
and an IP address, run the following command:
To add an HMC with user nameksysmgr add hmc hmc123 login=hscroot password=xyz123 ip=x.x.x.x
hscroot
and host namehmc1.testlab.ibm.com
, run the following command:ksysmgr add hmc hmc123 login=hscroot password=xyz123 hostname=hmc1.testlab.ibm.com
- Repeat step 1 to add multiple HMCs.
- Verify the HMCs that you have added by running the following
command:
An output that is similar to the following example is displayed:ksysmgr query hmc
Name: HMC1 Ip: 9.xx.yy.zz Login: hscroot Managed Host List: Host Name Uuid ========= ==== Host1_Site2 82e8fe16-5a9f-3e32-8eac-1ab6cdcd5bcf Host2_Site2 74931f30-e852-3d47-b564-bd263b68f1b1 Host3_Site2 c15e9b0c-c822-398a-b0a1-6180872c8518
Step 3: Add hosts
After the HMCs are added to the KSYS subsystem, you can review the list of hosts that are managed by each HMC, and then identify the hosts that you want to add to the KSYS, for high-availability.
- Add the managed host to the KSYS by running the following
command:
If the host has the same name as another host, you must specify the Universally Unique Identifier(UUID) of the host. Hosts are identified by its UUID as tracked in the HMC. You can also use the ksysmgr query hmc command to identify the host name and the host UUID. For example, to add a host with host name Host1_HMC1, host UUID Host_UUID1, and IP 10.x.x.x, run the following command:ksysmgr add host hostname [uuid=uuid] [hostname|ip=hostname|ip]
ksysmgr add Host1_HMC1 [uuid=Host_UUID1] [ip=10.x.x.x]
- Repeat step 1 for all hosts that you want to add to the KSYS subsystem.
- Verify the hosts that you added by running the following
command:
An output that is similar to the following example is displayed:ksysmgr query host
Name: Host2_HMC1 UUID: Host_UUID1 FspIp: 10.x.x.x Host_group: HG1 VIOS: VIOS1 VIOS2 HMCs: HMC1 Proactiveha: disable VM_failure_detection_speed: normal MachineSerial: 1081C16 Name: Host1_HMC1 UUID: d935fc75-0ede-3deb-a080-bdd37f228785 FspIp: 10.x.x.x Host_group: HG1 VIOS: VIOS3 VIOS4 HMCs: HMC1 Proactiveha: disable VM_failure_detection_speed: normal MachineSerial: 1081C16
Step 4: Create host groups
You can group a set of hosts depending on your business requirements. Each host in the KSYS subsystem must be a part of a host group.
The KSYS subsystem creates a health monitoring Shared Storage Pool (SSP) cluster across the virtual I/O servers that are part of the host group. The health cluster monitors health of all virtual I/O servers across the cluster and retains the health data that is available to the KSYS subsystem by using a VIOS in the host group. The SSP cluster is used only by the KSYS. You must not use this SSP cluster for any other purpose. You can continue to use virtual Small Computer System Interface (vSCSI) or N_Port ID Virtualization (NPIV) modes of the cluster. However, if an SSP cluster exists in your environment, the KSYS subsystem does not deploy any new SSP clusters and instead, uses the existing SSP cluster for health management. However, if an existing SSP cluster is used, the KSYS subsystem might not support VIOS management.
The KSYS subsystem requires two disks to create the health monitoring SSP cluster across the Virtual I/O Servers in the host group. A disk of at least 10 GB is required to monitor the health of all hosts, called as a repository disk, and another disk of at least 10 GB is required to track the health data, called as a HA disk, for each host group. These disks must be accessible to all the managed Virtual I/O Servers on each of the hosts in the host group. You must specify the disk details when you create the host group or before you run the first discovery operation. You cannot modify the HA disk after the discovery operation is run successfully. If you want to modify the HA disk, you must delete the host group and re-create the host group with the HA disk details.
VM Recovery Manager HA supports automatic replacement of the repository disk. To automatically replace the repository disk, you must provide the details about backup repository disk. A maximum of six backup repository disks can be added for automatic replacement. When the storage framework detects failure of a repository disk, the KSYS subsystem sends an event notification. Then the KSYS sub system searches each disk on the backup repository list and locates a valid and active backup repository disk, and replaces the failed repository disk with the backup repository disk without any interruption. The failed repository disk will be placed as the last backup repository disk in the backup repository disk list. The failed backup repository disk can be reused after the disk failure is fixed and it becomes valid and active. The backup repository disk must meet all the VM Recovery Manager HA requirements for the automatic replacement of repository disk. For more information, see, VM Recovery Manager HA requirements.
If the backup repository disk is not specified, the automatic replacement feature is disabled. However, a failed repository disk can be replaced manually from the KSYS subsystem. For more information, see, Troubleshooting repository disk failure.
- Identify the available disks that you can designate as the repository disk and the
HA disk for the SSP cluster and run one of the following commands:
-
ksysmgr query viodisks vios=name1[,name2,..]
-
ksysmgr query viodisk hosts=host1[,host2,..]
-
- Create a host group and add the hosts and disks that you want in this host group
by running the following
command:
For example, to add a host group with host group name HG1, hosts host1 and host2, repository disk diskuuid1, HA disk diskuuid2, and backup repository disks diskuuid3 and diskuuid4, run the following command:ksysmgr add host_group hgname hosts=host1,host2,… repo_disk=diskuuid1 ha_disk=diskuuid2 backup_repo_disk=diskuuid3,diskuuid4…
For repository disk failure issues, see Troubleshooting repository disk failure topic.ksysmgr add host_group HG1 hosts=host1,host2, repo_disk=diskuuid1 ha_disk=diskuuid2 backup_repo_disk=diskuuid3,diskuuid4
- Repeat steps 1 and 2 for all host groups that you want to create in the KSYS subsystem.
- Verify the host groups that you created by running the following
command:
ksysmgr query host_group
Optional: Configure virtual machines
-
ksysmgr unmanage vm name=vmname host=hostname | uuid=lparuuid | ALL host=hostname | ALL host_group=hg_name
-
ksysmgr unmanage vm vmname1|lparuuid1,...
If you installed the VM agent for HA monitoring at the VM and application level, you can enable the HA monitoring by running the ksysvmmgr start command in the virtual machine. For more information about configuring the VM agent, see the Setting up the VM agent topic.
Optional: Configure VIOS
When you add hosts to the KSYS subsystem, all the Virtual I/O Servers in the hosts are also added to the KSYS subsystem. The VM Recovery Manager HA solution monitors the hosts and virtual machines by using Virtual I/O Servers in the host.
The VM Recovery Manager HA solution requires at least two Virtual I/O Servers per host. You can have a maximum of 24 Virtual I/O Servers, spread across different hosts, in a single host group. If a host has more than 2 Virtual I/O Servers, you can exclude specific VIOS partitions from the HA management.
- Run the following
command:
You can include the VIOS partition for the HA management at any time by using the ksysmgr manage vios viosname command.ksysmgr unmanage vios viosname
- Verify the existing Virtual I/O Servers by running the following
command:
ksysmgr query vios name
The host monitor monitors the following file systems: /, /tmp, /usr, /var, /home. When the KSYS subsystem requests for the file system usage details, the host monitor responds with the details about the file system usage, which includes information about each file system and its usage. An event is generated when the file system usage surpasses the threshold value of the file system usage, also an event is generated when the file system usage comes under the threshold value.
Step 5: Setting contacts for event notification
The KSYS subsystem tracks various events that occur in the environment, analyzes the situation, and notifies you about any issues or potential disaster through the registered contacts. You must provide the contact details to the KSYS subsystem so that you can receive notifications about any situation that might need your action.
- Email address
- Phone number with phone carrier email address
You can add multiple email addresses for a specific user. However, you cannot add multiple email addresses simultaneously. You must run the command multiple times to add multiple email addresses.
You must specify the phone number along with the phone carrier email address to receive a short message service (SMS) notification. To find your phone carrier email address, contact your phone service provider.
- To add an email address of a specific user to receive notification, enter the following
command:
For example,ksysmgr add notify user=username contact=email_address
ksysmgr add notify user=John contact=john.doe@testmail.com
- To add a specific user to receive an SMS notification, enter the following
command:
For example,ksysmgr add notify user=username contact=10_digit_phone_number@phone_carrier_email_address
ksysmgr add notify user=John contact=1234567890@tmomail.net
- To modify the contact information, use the following commands:
For example, to change the user name of John to Dave, and to change the email address, enter the following command:ksysmgr modify notify oldcontact=old_username newcontact=new_username ksysmgr modify notify oldcontact=old_email_address newcontact=new_email_address
ksysmgr modify notify oldcontact=John newcontact=Dave ksysmgr modify notify oldcontact=john@gmail.com newcontact=dave@gmail.com
- To delete all the contact information for a specific user, use the following
command:
For example,ksysmgr delete notify user=username
ksysmgr delete notify user=John
Step 6: Enabling HA monitoring
You must enable HA monitoring for the KSYS subsystem to start monitoring the environment.
- Enable HA monitoring at system-level by running the following command:
ksysmgr modify system ha_monitor=enable
- Enable HA monitoring at VM-level for each VM by running the following
command:
ksysmgr modify vm vm1[,vm2,...] ha_monitor=enable
- Enable HA monitoring at the host group level by running the following command:
ksysmgr modify host_group <name> options [ha_monitor=<enable | disable>]
Step 7: Discovering and verifying the KSYS configuration
After adding various resources (HMCs, hosts, and host groups) to the KSYS subsystem, you must run the discovery operation. During the initial discovery operation, the KSYS subsystem creates the required high availability setup to monitor the VMs and hosts. The KSYS subsystem creates an SSP cluster based on the information that is specified in the configuration steps. During any subsequent discovery operations, the KSYS subsystem scans the environment for any changes to the environment and adapts to the modified environment. For example, when you add a host or when you run the Live Partition Mobility (LPM) operation from one host to another host that is outside of the current KSYS subsystem, the KSYS configuration settings are updated in the next discovery operation. By default, the KSYS subsystem automatically rediscovers sites once in every 24 hours at 00:00 hours. You can change this period by modifying the auto_discover_time system attribute.
After the KSYS subsystem discovers the resources, a verification is required to ensure that the virtual machines can be restarted on another host without any errors during a failover operation. The first discovery operation can take a few minutes because the SSP health cluster is deployed during the first discovery operation.
- Discover the resources by running the following
command:
ksysmgr discover host_group hg_name
After you run discovery operation, the KSYS subsystem creates the SSP cluster and file system in all Virtual I/O Servers of the host group. You might encounter database corruption error during failure of the virtual machine. To avoid database corruption error, complete the following procedure on any of the VIOS of the host group:- To find the pool path, run the following command:
An output that is similar to the following example is displayed:pooladm pool list
Pool Path ------------------------------ /var/vio/SSP/KSYS_Demo_1_1/D_E_F_A_U_L_T_061310
- Go to the path <pooladm_path>/VIOSCFG/DB/PG. Update the values of the
checkpoint_timeout and max_wal_size attributes in the
postgresql.conf file as shown in the following example:
checkpoint_timeout = 180s #range 30s-1d max_wal_size = 1GB
Note: Ensure that you complete the above procedure after a successful discovery operation and the procedure should be performed on a VIOS that belongs to another host group that present in the cluster.The procedure will update all Virtual I/O Servers of the host group. - To find the pool path, run the following command:
- Verify the resources by running the following
command:
ksysmgr verify host_group hg_name
ksysmgr discover host_group hg_name verify=yes
Optional: Backing up the configuration data
You can back up all the current configuration settings of your KSYS environment as a snapshot. A snapshot preserves the configuration details of the KSYS environment at a specific point in time. For example, a snapshot file contains the information about the existing sites, details about the managing HMCs and the managed hosts in a specific site, and the storage device details in the site. You should back up your current configuration settings after you configure the sites, hosts, HMCs, and storage devices initially.
If you save a snapshot of the current KSYS configuration settings, you can restore the configuration settings later by applying the snapshot on the KSYS configuration. The snapshots are useful during node upgrades or environment malfunctions because snapshots eliminate the need to reconfigure the sites, hosts, HMCs, and storage devices. For example, if the KSYS node must be reinstalled, you can use a snapshot and do not have to re-create sites, hosts, and other resources.
You can save the following types of snapshots:
- Detailed configuration data
- To restore the DETAILED type of snapshot, the version of VM Recovery Manager HA must be the same as the version when the snapshot was captured. The snapshot must be captured on the home site only, and must be restored only on the home site. When the snapshot is restored, ensure that all VMs are on the home site.