Replacing a failed node in a high availability group

If an appliance that belongs to a high availability (HA) group fails, you can replace the appliance and then restore the HA group by following this procedure.

Before you begin

When a node in an HA group fails, the queue managers fail over to the remaining appliance in the group. To restore high availability function after you replace or repair the failed appliance, you must first deconstruct the HA group by running the queue managers stand-alone and deleting the HA group from the remaining appliance. You then create a new HA group, and add the queue managers back to it.

Before you create the new group, you must ensure that both appliances are running the same level of firmware. If your new appliance is running a later version of the firmware, you must either upgrade your existing appliance, or downgrade your new appliance.

Procedure

  1. On the appliance that did not fail, stop each queue manager by using the following command:
    
    endmqm QMname
    
  2. If the queue manager is part of a disaster recovery configuration as well as part of an HA group, you must remove it from the disaster recovery configuration. Use the following command:
    
    dltdrprimary -m QMname
    
  3. Enter the following command to remove a queue manager from the HA group and run it as a stand-alone queue manager. The queue manager must be stopped before you run this command.
    
    sethagrp -e QMname 
    
    Where QMname is the name of the queue manager. The queue manager is removed from the HA group. You can use the strmqm command to restart the queue manager and run it in a stand alone configuration while you replace the failed node, if required.

    Repeat this command for all HA queue managers.

  4. Delete the HA group by entering the following command:
    
    dlthagrp
    
  5. On both the existing appliance and the replacement appliance, create a new HA group by using the prepareha and crthagrp commands, as described in Creating a high availability group.
  6. On the appliance that did not fail, enter the following command to add a queue manager back to the HA group. The queue manager must be stopped before you run this command.
    
    sethagrp -i QMname
    
    Where QMname is the name of the existing queue manager. The queue manager is added to the group and is started. Repeat for all the queue managers that were previously part of the HA group.
  7. Set the preferred appliance for the queue manager by running the following command:
    
    sethapreferred QMname
    
    Repeat this command for each queue manager. Run the command on the appliance that did not fail if you want that appliance to be the preferred location. Run the command on the replaced or repaired appliance if you want that appliance to be the preferred location.
  8. If you want to restore disaster recovery capability to any of the queue managers, follow the instructions in Configuring disaster recovery for a high availability queue manager.