If you lose one of the nodes in a disaster recovery configuration, you can replace the
node and restore the disaster recovery configuration by following this procedure.
About this task
If a disaster occurs such that the node in the main site is beyond repair, you can replace the
failed node while the queue manager runs on the recovery node and then restore the original disaster
recovery configuration. The replacement node must assume the identity of the failed node: the name
and IP address must be the same.
You must either be logged in as root or logged in as a user who belongs to the
mqm group and has the necessary sudo configuration.
Procedure
Following the loss of the queue manager on the main site, take the following
steps:
-
On the recovery node, run the following commands to make the secondary queue manager assume the
primary role:
rdqmdr -m QMname -p
Where
QMname is
the name of the queue manager.
-
Retrieve the command that you will need to run on the replacement primary node to reconfigure
disaster recovery:
rdqmdr -m QMname -d
Copy the output of this
command.
-
Run the following command to start the queue manager:
-
Ensure that your applications reconnect to the queue manager on the recovery node. Provided
that you have defined your channels with a list of alternative connection names, specifying your
primary and secondary queue managers, then your applications will automatically connect to the new
primary queue manager.
-
Replace the failed node on your main site and configure it to have the same name and IP address
that you used for disaster recovery on the original node. Then configure disaster recovery by
running the crtmqm command that you copied in step 2. You now have a secondary
instance of the queue manager, and the primary instance synchronizes its data with the secondary
instance.
-
End the current primary instance.
-
After the synchronization has completed, make the primary instance that is running on the
recovery node into the secondary once more:
-
On the replacement primary node, make the secondary instance of the queue manager into the
primary instance:
-
On the replacement primary node, start the queue manager:
strmqm QMname
You have now restored the configuration as
it was before the failure at your main site.