Operating in a DR/HA environment

You can fail over from a high availability (HA) queue manager to another HA group at the disaster recovery (DR) site.

For details of how to configure appliances on two sites so that you can implement this scenario, see Configuring disaster recovery fail over to another high availability group. The following sections give instructions on how to operate in this configuration.

Failing over to the recovery site

If you lose the main site, where your HA queue manager is running, the first step is to fail the queue manager over to the recovery site and start the DR secondary instance. Complete the following steps:
  1. Log in to the recovery appliance as a user in the administrators group.
  2. Type the following command to enter IBM® MQ administration mode:
    
    mqcli
    
  3. Run the following command:
    
    makedrprimary -m QMname
    

    Where QMname is the name of the queue manager.

    If the state of the queue manager is inconsistent when it starts (that is, replication failed from the main site), the queue manager reverts to the previous saved snapshot of its data.

  4. Run the following command to start the queue manager:
    
    strmqm QMname
    
Ensure that your applications reconnect to the queue manager on the recovery appliance. Provided that you have defined your channels with a list of alternative connection names, specifying your primary and secondary queue managers, then your applications will automatically connect to the new primary queue manager.

Running the queue manager ensures that all is well before you then add it to the HA group.

Note: If your queue manager is still running on the original site, stop it to avoid split-brain problems when you restore your configuration.

Removing the queue manager from the DR configuration

Before you can add the queue manager to the HA group at the recovery site, you must first remove the DR configuration. Complete the following steps:
  1. Enter the IBM MQ administration mode by entering the following command:
    mqcli
  2. Stop the queue manager:
    endmqm QMname
  3. Enter the following command to remove the queue manager from the disaster recovery configuration.
    dltdrprimary -m QMname

Adding the queue manager to the HA group

Enter the following command to add the queue manager to the HA group:

sethagrp -i QMname
The queue manager is automatically started.

Restoring the DR configuration to your original site

When problems have been resolved and your original site is operational again, your first step is to reconfigure disaster recovery between your two sites. You make the HA queue manager running on your recovery site the DR primary, and create a DR secondary on your restored main site.

First you delete the original queue manager on your recovered main site:
  1. Log in to the main appliance as a user in the administrators group.
  2. Type the following command to enter IBM MQ administration mode:
    
    mqcli
    
  3. Remove the DR configuration of the queue manager by completing the following steps:
    1. Run the following command to establish if the queue manager currently has the DR primary or secondary role:
      status QMname
    2. If the queue manager has the DR primary role, run the following command to remove the DR configuration:
      dltdrprimary -m QMname
    3. If the queue manager has the DR secondary role, run the following command to remove the DR configuration:
      dltdrsecondary -m QMname
  4. Remove the HA queue manager from the HA group on your restored main site:
    sethagrp -e QMname
  5. Delete the HA queue manager:
    dltmqm QMname
You then make the queue manager running in the HA group on your recovery appliance a DR primary:
  1. Log in to the recovery appliance as a user in the administrators group.
  2. Type the following command to enter IBM MQ administration mode:
    
    mqcli
    
  3. Do not stop the HA queue manager, the ending of the queue manager is ended by the underlying high availability system. (If you manually end the HA queue manager, the crtdrprimary command might hang, in which case contact IBM Support.)
  4. Specify that the queue manager is the primary instance in a disaster recovery configuration and optionally include a floating IP address that can be used by either of the appliances in the HA pair:
    
    crtdrprimary -m QMname -r RecoveryName -i RecoveryIP 
    -p port_number [-f floating_IP]
    
    Where:
    -m QMName
    Specifies the queue manager that you are preparing for participation in a disaster recovery configuration.
    -r RecoveryName
    Specifies the name of the IBM MQ Appliance that is the recovery appliance.
    -i RecoveryIP
    Specifies the IP address of the recovery appliance.
    -p port
    Specifies the port that the data replication listener on each appliance uses.
    -f floatingIP
    If your HA appliances have static IP addresses assigned to the replication interface (eth20) that are in the same subnet, you can specify a floating IP address. The floating IP address is an IPv4 address that is used to replicate queue manager data from whichever HA appliance the queue manager is currently running on to the queue manager on the recovery appliance. Note that you do not physically configure an Ethernet port with this address. Select a free IP address in the same subnet as the replication ports on the two appliances, and specify it in the crtdrprimary command to make it the IP used for replication with the recovery appliance. You must specify a different floating IP address for each of the HA queue managers that you configure disaster recovery for.

    The crtdrprimary command configures the queue manager on both appliances in the HA pair, and reserves storage for the data snapshot on both appliances.

    The crtdrprimary command returns a crtdrsecondary command when it has completed, for example:
    
    Queue manager QM3 is prepared for Disaster Recovery replication.
    Now execute the following command on appliance mydrappl:
    crtdrsecondary -m QM3 -s 65536 -l myliveapp3 -i 198.51.100.10 -p 2015
    
    If you specified a floating IP, then the -i parameter is the floating IP, if you omitted the -f floating_IP argument, then the -i parameter is the static IP address of the replication interface of the appliance that you ran the crtdrprimary command on (or the appliance that is the preferred location for the HA queue manager, if that is different). If the queue manager fails over to the other appliance in the HA pair, that appliance identifies itself to the DR appliance so that replication can continue.
  5. Copy the crtdrsecondary command and run it on the main appliance. This creates a secondary version of the queue manager, and queue manager data is replicated from the primary queue manager.

You now have a fully operational DR/HA configuration again. You can carry on and restore the original configuration, if required, by failing the queue manager back to the main site, and adding it back to the HA group there. Follow the same procedure that you followed for failing over to the recovery site and adding the queue manager to the HA group there, but carry out the steps on the main site rather than the recovery site.