Setting up the KSYS subsystem

You can use the ksysmgr command or the VM Recovery Manager HA GUI to interact with the KSYS daemon to manage the entire environment for high availability.

The VM Recovery Manager HA solution monitors the hosts and the virtual machines when you add information about your environment to the KSYS configuration settings. Complete the following steps to set up the KSYS subsystem:

  1. Step 1: Initialize the KSYS cluster.
  2. Step 2: Add HMCs.
  3. Step 3: Add hosts.
  4. Step 4: Create host groups.
  5. Optional: Configure virtual machines.
  6. Optional: Configure VIOS.
  7. Step 5: Setting contacts for event notification.
  8. Step 6: Enabling HA monitoring.
  9. Step 7: Discovering and verifying the KSYS configuration.
  10. Optional: Backing up the configuration data.

Step 1: Initialize the KSYS cluster

The KSYS environment relies on Reliable Scalable Cluster Technology (RSCT) to create its cluster on the KSYS logical partition (LPAR). After you create the KSYS cluster, various daemons of RSCT and KSYS are activated. The KSYS node can then process the commands that you specify in the command line.

To create and initialize a KSYS cluster, complete the following steps on the KSYS LPAR:
  1. Configure a cluster and add the KSYS node to the cluster by running the following command:
    ksysmgr add ksyscluster cluster_name ksysnodes=ksys_nodename type=HA
  2. Verify the KSYS cluster configuration by running the following command:
    ksysmgr verify ksyscluster cluster_name
  3. Deploy the one-node KSYS cluster by running the following command.
    ksysmgr sync ksyscluster cluster_name
    Note: You can perform steps 1-3 by running the following command:
    ksysmgr add ksyscluster cluster_name ksysnodes=ksys_nodename sync=yes type=HA
    This command creates a cluster, adds the KSYS node to the cluster, verifies the cluster configuration, and deploys the one-node KSYS cluster.
  4. Optional: Verify the KSYS cluster that you created by running the following command:
    ksysmgr query ksyscluster
    An output that is similar to the following example is displayed:
    Name:            ksys_test
    State:             Online
    Type:              HA
    Ksysnodes:         ksys_nodename:1:Online
    KsysState:         ksys_nodename:1:Online
    Note: These commands do not display any output until you run the ksysmgr sync command.

Creating and initializing a multi-node KSYS cluster

To create and initialize a multi-node KSYS cluster, complete the following steps:
  1. To create a cluster and to add multiple KSYS nodes to the KSYS cluster, run the following command:
    ksysmgr add ksyscluster cluster_name ksysnodes=ksys_nodename1,ksys_nodename2 type=HA
    This command creates a cluster, adds the KSYS nodes that you specified to the cluster, verifies the cluster configuration, and deploys the multi-node KSYS cluster.
    Note:
    • For best performance, the RSCT version on all KSYS nodes must be the same. If the RSCT version is not the same on all KSYS nodes, it's recommended to choose the node that has the lowest RSCT version as the group leader node and then create the KSYS cluster on the group leader node.
    • This release supports only a one-node or two-node KSYS cluster
    • To use a multi-node KSYS cluster, VM Recovery Manager HA Version 1.7, or later must be installed on all KSYS nodes.
  2. To verify the KSYS cluster configuration, run the following command:
    ksysmgr verify ksyscluster cluster_name
  3. To deploy the multi-node KSYS cluster, run the following command:
    ksysmgr sync ksyscluster cluster_name
    Note: You can run the following single command to perform Step 1 to Step 3:
    ksysmgr add ksyscluster cluster_name ksysnodes=ksys_nodename1,ksys_nodename2
          sync=yes type=HA
    This command creates a cluster, adds multiple KSYS nodes that you specified in the command to the cluster, verifies the cluster configuration, and deploys the multi-node KSYS cluster.
  4. Optional: To verify the KSYS cluster that you created, run the following command:
    ksysmgr query ksyscluster
    An output that is similar to the following example is displayed:
    Name:                test_ksys
    State:               Online
    Type:                HA
    Ksysnodes:           ksys_nodename1:1:Online(Managing node)
                         ksys_nodename2:2:Online
    KsysState:           ksys_nodename1:1:Online
                         ksys_nodename2:2:Online
    Note: These commands do not display any output until you run the ksysmgr sync command.

Modifying a multi-node KSYS cluster

You can modify a KSYS cluster to add multiple KSYS nodes and to remove multiple KSYS nodes.
  • To remove a KSYS node from the KSYS cluster, run the following command:
    ksysmgr modify ksyscluster cluster_name remove ksysnodes=ksys_nodename2
    Note: For best performance, the RSCT version on all KSYS nodes must be the same. If the RSCT version is not the same on all KSYS nodes, it's recommended to choose the node that has the lowest RSCT version as the group leader node and then create the KSYS cluster on the group leader node.
    You can verify the KSYS cluster that you modified by running the ksysmgr query ksyscluster command. An output that is similar to the following example is displayed:
    
    Name:      test_ksys 
    State:     Online 
    Type:      HA 
    Ksysnodes: ksys_nodename1:1:Online
    KsysState: ksys_nodename1:1:Online
  • To add a KSYS node to the KSYS cluster, run the following command on the group leader node:
    ksysmgr modify ksyscluster cluster_name add ksysnodes=ksys_nodename2
    You can verify the KSYS cluster that you modified by running the ksysmgr query ksyscluster commands. An output that is similar to the following example is displayed:
    
    Name:         test_ksys 
    State:        Online 
    Type:         HA 
    Ksysnodes:    ksys_nodename1:1:Online(Managing node) 
                  ksys_nodename2:2:Online
    KsysState:    ksys_nodename1:1:Online
                  ksys_nodename2:2:Online

Step 2: Add HMCs

The KSYS interacts with the HMC for discovery, verification, monitor, recovery, and cleanup operations. HMCs provide details about the hosts and VIOS partitions that are managed by the HMCs. The VM Recovery Manager HA solution cannot be implemented without configuring the HMCs.
Note:
  • The HMC user, whose user name and password details are provided to the KSYS, must have at least hmcsuperadmin privileges and remote access. The KSYS subsystem uses the Representational State Transfer (REST) API to communicate with the HMCs in the environment. Therefore, ensure that your environment allows HTTPS communication between the KSYS and HMC subsystems.
  • Ensure that port 12443 on HMC is excluded from the firewall so that the KSYS subsystem can communicate with the HMC using the HMC REST API.
To add the HMCs to the KSYS configuration setting, complete the following steps in the KSYS LPAR:
  1. Add the HMC with user name: hscroot and password: xyz123 by running the following command:
    ksysmgr add hmc hmcname
          login=username
          password=password
          hostname|ip=hostname|ip
    For example, to add an HMC with user name hscroot and an IP address, run the following command:
    ksysmgr add hmc hmc123 login=hscroot password=xyz123 ip=x.x.x.x
    To add an HMC with user name hscroot and host name hmc1.testlab.ibm.com, run the following command:
    ksysmgr add hmc hmc123 login=hscroot password=xyz123 hostname=hmc1.testlab.ibm.com
  2. Repeat step 1 to add multiple HMCs.
  3. Verify the HMCs that you have added by running the following command:
    ksysmgr query hmc
    An output that is similar to the following example is displayed:
    
    Name:    HMC1
    Ip:      9.xx.yy.zz
    Login:   hscroot
              Managed Host List:
    Host Name              Uuid
    =========              ====
    Host1_Site2            82e8fe16-5a9f-3e32-8eac-1ab6cdcd5bcf
    Host2_Site2            74931f30-e852-3d47-b564-bd263b68f1b1
    Host3_Site2            c15e9b0c-c822-398a-b0a1-6180872c8518

Step 3: Add hosts

After the HMCs are added to the KSYS subsystem, you can review the list of hosts that are managed by each HMC, and then identify the hosts that you want to add to the KSYS, for high-availability.

To add hosts to the KSYS configuration, complete the following steps in the KSYS LPAR:
  1. Add the managed host to the KSYS by running the following command:
    ksysmgr add host hostname
          [uuid=uuid]
          [hostname|ip=hostname|ip]
    If the host has the same name as another host, you must specify the Universally Unique Identifier(UUID) of the host. Hosts are identified by its UUID as tracked in the HMC. You can also use the ksysmgr query hmc command to identify the host name and the host UUID. For example, to add a host with host name Host1_HMC1, host UUID Host_UUID1, and IP 10.x.x.x, run the following command:
    ksysmgr add Host1_HMC1
          [uuid=Host_UUID1]
          [ip=10.x.x.x]
  2. Repeat step 1 for all hosts that you want to add to the KSYS subsystem.
  3. Verify the hosts that you added by running the following command:
    ksysmgr query host
    An output that is similar to the following example is displayed:
    Name:                       Host2_HMC1
    UUID:                       Host_UUID1
    FspIp:                      10.x.x.x
    Host_group:                 HG1
    VIOS:                       VIOS1
                                VIOS2
    HMCs:                       HMC1
    Proactiveha:                disable
    VM_failure_detection_speed: normal
    MachineSerial:              1081C16
    Name:                       Host1_HMC1
    UUID:                       d935fc75-0ede-3deb-a080-bdd37f228785
    FspIp:                      10.x.x.x
    Host_group:                 HG1
    VIOS:                       VIOS3
                                VIOS4
    HMCs:                       HMC1
    Proactiveha:                disable
    VM_failure_detection_speed: normal
    MachineSerial:              1081C16

Step 4: Create host groups

You can group a set of hosts depending on your business requirements. Each host in the KSYS subsystem must be a part of a host group.

The KSYS subsystem creates a health monitoring Shared Storage Pool (SSP) cluster across the virtual I/O servers that are part of the host group. The health cluster monitors health of all virtual I/O servers across the cluster and retains the health data that is available to the KSYS subsystem by using a VIOS in the host group. The SSP cluster is used only by the KSYS. You must not use this SSP cluster for any other purpose. You can continue to use virtual Small Computer System Interface (vSCSI) or N_Port ID Virtualization (NPIV) modes of the cluster. However, if an SSP cluster exists in your environment, the KSYS subsystem does not deploy any new SSP clusters and instead, uses the existing SSP cluster for health management. However, if an existing SSP cluster is used, the KSYS subsystem might not support VIOS management.

The KSYS subsystem requires two disks to create the health monitoring SSP cluster across the Virtual I/O Servers in the host group. A disk of at least 10 GB is required to monitor the health of all hosts, called as a repository disk, and another disk of at least 10 GB is required to track the health data, called as a HA disk, for each host group. These disks must be accessible to all the managed Virtual I/O Servers on each of the hosts in the host group. You must specify the disk details when you create the host group or before you run the first discovery operation. You cannot modify the HA disk after the discovery operation is run successfully. If you want to modify the HA disk, you must delete the host group and re-create the host group with the HA disk details.

VM Recovery Manager HA supports automatic replacement of the repository disk. To automatically replace the repository disk, you must provide the details about backup repository disk. A maximum of six backup repository disks can be added for automatic replacement. When the storage framework detects failure of a repository disk, the KSYS subsystem sends an event notification. Then the KSYS sub system searches each disk on the backup repository list and locates a valid and active backup repository disk, and replaces the failed repository disk with the backup repository disk without any interruption. The failed repository disk will be placed as the last backup repository disk in the backup repository disk list. The failed backup repository disk can be reused after the disk failure is fixed and it becomes valid and active. The backup repository disk must meet all the VM Recovery Manager HA requirements for the automatic replacement of repository disk. For more information, see, VM Recovery Manager HA requirements.

If the backup repository disk is not specified, the automatic replacement feature is disabled. However, a failed repository disk can be replaced manually from the KSYS subsystem. For more information, see, Troubleshooting repository disk failure.

To create host group in the KSYS subsystem, complete the following steps in the KSYS LPAR:
  1. Identify the available disks that you can designate as the repository disk and the HA disk for the SSP cluster and run one of the following commands:
    • ksysmgr query viodisks vios=name1[,name2,..]
    • ksysmgr query viodisk hosts=host1[,host2,..]
  2. Create a host group and add the hosts and disks that you want in this host group by running the following command:
    ksysmgr add host_group hgname hosts=host1,host2,… repo_disk=diskuuid1 ha_disk=diskuuid2 backup_repo_disk=diskuuid3,diskuuid4…
    For example, to add a host group with host group name HG1, hosts host1 and host2, repository disk diskuuid1, HA disk diskuuid2, and backup repository disks diskuuid3 and diskuuid4, run the following command:
    ksysmgr add host_group HG1 hosts=host1,host2, repo_disk=diskuuid1 ha_disk=diskuuid2 backup_repo_disk=diskuuid3,diskuuid4
    For repository disk failure issues, see Troubleshooting repository disk failure topic.
  3. Repeat steps 1 and 2 for all host groups that you want to create in the KSYS subsystem.
  4. Verify the host groups that you created by running the following command:
    ksysmgr query host_group

Optional: Configure virtual machines

When a host is added to the KSYS subsystem, all the virtual machines in the host are included by default in the HA management. If you do not want high availability for any of the virtual machines, you can exclude specific virtual machines from the HA management by running one of the following commands:
  • ksysmgr unmanage vm name=vmname host=hostname | uuid=lparuuid | 
                        ALL host=hostname | ALL host_group=hg_name
  • ksysmgr unmanage vm vmname1|lparuuid1,...
Note: To manage or unmanage a VM before running the first discovery operation, you can not use ALL option with host or host group.
You can include the VM back in the HA management at any time by using the ksysmgr manage vm command.

If you installed the VM agent for HA monitoring at the VM and application level, you can enable the HA monitoring by running the ksysvmmgr start command in the virtual machine. For more information about configuring the VM agent, see the Setting up the VM agent topic.

Optional: Configure VIOS

When you add hosts to the KSYS subsystem, all the Virtual I/O Servers in the hosts are also added to the KSYS subsystem. The VM Recovery Manager HA solution monitors the hosts and virtual machines by using Virtual I/O Servers in the host.

The VM Recovery Manager HA solution requires at least two Virtual I/O Servers per host. You can have a maximum of 24 Virtual I/O Servers, spread across different hosts, in a single host group. If a host has more than 2 Virtual I/O Servers, you can exclude specific VIOS partitions from the HA management.

To exclude specific VIOS partitions from the HA management, complete the following steps:
  1. Run the following command:
    ksysmgr unmanage vios viosname
    You can include the VIOS partition for the HA management at any time by using the ksysmgr manage vios viosname command.
  2. Verify the existing Virtual I/O Servers by running the following command:
    ksysmgr query vios name
You can configure a specific LPAR and VIOS such that during each discovery operation, the KSYS subsystem fetches the size of the VIOS file system and the current file system usage in the VIOS. When the percentage of file system usage reaches the threshold value of 80%, the KSYS subsystem notifies you with a warning message so that you can make necessary updates to the VIOS file system.

The host monitor monitors the following file systems: /, /tmp, /usr, /var, /home. When the KSYS subsystem requests for the file system usage details, the host monitor responds with the details about the file system usage, which includes information about each file system and its usage. An event is generated when the file system usage surpasses the threshold value of the file system usage, also an event is generated when the file system usage comes under the threshold value.

Step 5: Setting contacts for event notification

The KSYS subsystem tracks various events that occur in the environment, analyzes the situation, and notifies you about any issues or potential disaster through the registered contacts. You must provide the contact details to the KSYS subsystem so that you can receive notifications about any situation that might need your action.

You can add the following contact information for a specific user:
  • Email address
  • Phone number with phone carrier email address

You can add multiple email addresses for a specific user. However, you cannot add multiple email addresses simultaneously. You must run the command multiple times to add multiple email addresses.

You must specify the phone number along with the phone carrier email address to receive a short message service (SMS) notification. To find your phone carrier email address, contact your phone service provider.

Note: The logical partition, in which the KSYS subsystem software is installed, must have a public IP address to send the event notifications successfully.
To register contact details to receive notification from the KSYS, run the following commands in the KSYS LPAR:
  • To add an email address of a specific user to receive notification, enter the following command:
    ksysmgr add notify user=username contact=email_address
    For example,
    ksysmgr add notify user=John contact=john.doe@testmail.com 
  • To add a specific user to receive an SMS notification, enter the following command:
    ksysmgr add notify user=username 
    contact=10_digit_phone_number@phone_carrier_email_address
    For example,
    ksysmgr add notify user=John contact=1234567890@tmomail.net
  • To modify the contact information, use the following commands:
    ksysmgr modify notify oldcontact=old_username newcontact=new_username
    ksysmgr modify notify oldcontact=old_email_address newcontact=new_email_address
    For example, to change the user name of John to Dave, and to change the email address, enter the following command:
    ksysmgr modify notify oldcontact=John newcontact=Dave
    ksysmgr modify notify oldcontact=john@gmail.com newcontact=dave@gmail.com
  • To delete all the contact information for a specific user, use the following command:
    ksysmgr delete notify user=username
    For example,
    ksysmgr delete notify user=John

Step 6: Enabling HA monitoring

You must enable HA monitoring for the KSYS subsystem to start monitoring the environment.

To enable HA monitoring, enter the following command:
  1. Enable HA monitoring at system-level by running the following command:
    ksysmgr modify system ha_monitor=enable
  2. Enable HA monitoring at VM-level for each VM by running the following command:
    ksysmgr modify vm vm1[,vm2,...] ha_monitor=enable
  3. Enable HA monitoring at the host group level by running the following command:
    ksysmgr modify host_group <name> options [ha_monitor=<enable | disable>]

Step 7: Discovering and verifying the KSYS configuration

After adding various resources (HMCs, hosts, and host groups) to the KSYS subsystem, you must run the discovery operation. During the initial discovery operation, the KSYS subsystem creates the required high availability setup to monitor the VMs and hosts. The KSYS subsystem creates an SSP cluster based on the information that is specified in the configuration steps. During any subsequent discovery operations, the KSYS subsystem scans the environment for any changes to the environment and adapts to the modified environment. For example, when you add a host or when you run the Live Partition Mobility (LPM) operation from one host to another host that is outside of the current KSYS subsystem, the KSYS configuration settings are updated in the next discovery operation. By default, the KSYS subsystem automatically rediscovers sites once in every 24 hours at 00:00 hours. You can change this period by modifying the auto_discover_time system attribute.

After the KSYS subsystem discovers the resources, a verification is required to ensure that the virtual machines can be restarted on another host without any errors during a failover operation. The first discovery operation can take a few minutes because the SSP health cluster is deployed during the first discovery operation.

To discover and verify the configuration for a specific host group, complete the following steps:
  1. Discover the resources by running the following command:
    ksysmgr discover host_group hg_name
    After you run discovery operation, the KSYS subsystem creates the SSP cluster and file system in all Virtual I/O Servers of the host group. You might encounter database corruption error during failure of the virtual machine. To avoid database corruption error, complete the following procedure on any of the VIOS of the host group:
    1. To find the pool path, run the following command:
      pooladm pool list
      An output that is similar to the following example is displayed:
      Pool Path
       ------------------------------ 
      /var/vio/SSP/KSYS_Demo_1_1/D_E_F_A_U_L_T_061310 
    2. Go to the path <pooladm_path>/VIOSCFG/DB/PG. Update the values of the checkpoint_timeout and max_wal_size attributes in the postgresql.conf file as shown in the following example:
      checkpoint_timeout = 180s              #range 30s-1d
      max_wal_size = 1GB
    Note: Ensure that you complete the above procedure after a successful discovery operation and the procedure should be performed on a VIOS that belongs to another host group that present in the cluster.
    The procedure will update all Virtual I/O Servers of the host group.
  2. Verify the resources by running the following command:
    ksysmgr verify host_group hg_name
You must run the discovery and verification commands each time you modify the resources in the KSYS subsystem. To perform both the discovery and verification operations, run the following command:
ksysmgr discover host_group hg_name verify=yes

Optional: Backing up the configuration data

You can back up all the current configuration settings of your KSYS environment as a snapshot. A snapshot preserves the configuration details of the KSYS environment at a specific point in time. For example, a snapshot file contains the information about the existing sites, details about the managing HMCs and the managed hosts in a specific site, and the storage device details in the site. You should back up your current configuration settings after you configure the sites, hosts, HMCs, and storage devices initially.

If you save a snapshot of the current KSYS configuration settings, you can restore the configuration settings later by applying the snapshot on the KSYS configuration. The snapshots are useful during node upgrades or environment malfunctions because snapshots eliminate the need to reconfigure the sites, hosts, HMCs, and storage devices. For example, if the KSYS node must be reinstalled, you can use a snapshot and do not have to re-create sites, hosts, and other resources.

You can save the following types of snapshots:

Detailed configuration data
To restore the DETAILED type of snapshot, the version of VM Recovery Manager HA must be the same as the version when the snapshot was captured. The snapshot must be captured on the home site only, and must be restored only on the home site. When the snapshot is restored, ensure that all VMs are on the home site.
If you capture a snapshot on a lower version of VM Recovery Manager HA, you cannot restore the configuration settings on a higher version of VM Recovery Manager HA.
Note: Do not restore or save a snapshot when an operation is running on the KSYS node.