If an appliance that belongs to a high availability (HA) group fails, you can replace the
appliance and then restore the HA group by following this procedure. You can also follow this
procedure if you want to replace a functioning appliance (for example, with a later
model).
Before you begin
When a node in an HA group fails (or is shut down), the queue managers fail over to the remaining
appliance in the group. You can continue running the queue managers on the remaining appliance while
you replace the failed or shut down appliance.
This procedure only preserves HA queue managers that belong to the HA group on the failed
appliance. It does not preserve stand alone queue managers, and you need to take steps to recreate
the configuration of any disaster recovery queue managers that existed on the failed appliance (see
dspdrsecondary).
To restore high availability function after you replace or repair the failed appliance, you
configure the appliance so that it looks like the one it is replacing and then run a recreate HA
group command on the remaining appliance in the group.
You must ensure that both appliances are running the same level of firmware. If your new
appliance is running a later version of the firmware, you must downgrade your new appliance.
Note: This procedure preserves HA queue managers that were running on the failed or replaced
appliance. You must take steps to back up and manually restore any standalone queue managers that
were running on that appliance.
Procedure
-
If the appliance you are replacing is running, you can prepare for its replacement by
completing the following steps:
-
Back up the appliance configuration, see Secure backup for
release 9.3.5 and Backing up or saving the appliance configuration for earlier releases.
-
For releases before 9.3.5, back up details of the messaging users, see Backing up messaging users.
-
Take a note of the Ethernet settings (this information is contained in the configuration backup
file).
- Use the dsphalink command to check whether the configuration is
using the default eth21 for the replication. Make a note if a custom link is used.
-
Take a note of the appliance name (this information is contained in the configuration backup
file).
-
Take a note of the exact version of firmware that the appliance is running.
-
Optionally, create a test HA queue manager that can be used for initial testing after the
replacement appliance is set up in the HA group.
-
Shut down the appliance (failing over queue managers to the other appliance in the HA group).
-
To configure the replacement appliance to match the original appliance, complete the following
steps:
-
Install the replacement appliance, ensuring that you connect all the Ethernet cables as they
were connected on the old appliance. See Installation of the appliance in a rack.
-
Configure the appliance by using one of the following methods:
- Restore from the configuration backup, if you were able to take one. See Secure restore
Note: If you restore to a later model of
appliance, resources that are not applicable to the replacement appliance model are disregarded but
might cause startup errors (see
startup errors). After you correct the errors you might need
to do a
write mem in the configuration console and restart the
appliance.
- Run the installation wizard, see Initializing the appliance.
- Use the command line interface, see Configuring the appliance.
- Use the web UI, see Configuring the appliance.
You must configure the appliance to match its predecessor. (If you do not know the appliance
name, use the
dsphagrp command from the
mqcli
command line of
the other appliance to discover it.)
-
Install the version of the firmware that the original appliance was running, see Installing new firmware.
- For release 9.3.5, restore the backup. See Secure restore. (If you were unable to take backups from your
original appliance, you can take backups from the other appliance in the HA group and restore them
to your new appliance, see Secure backup.)
-
Verify that the Ethernet IP addresses and system name configured on the new appliance match
those on the original appliance.
-
To prepare the surviving appliance in the HA group:
-
Back up the queue managers. See Backing up a queue manager.
-
Change the preferred location setting on all the HA queue managers to this appliance. See Managing queue manager locations in a high availability group.
-
If your HA queue managers are also configured for disaster recovery (DR), ensure that they have
the DR Primary role on this appliance. See Viewing the status of a disaster recovery queue manager. (If any of these queue
managers have the Partitioned status, resolve that now rather than waiting
until the other appliance is restored.)
-
To recreate the HA group:
-
On the new appliance, issue the prepareha command from the
mqcli
command line:
prepareha -s secret_key -a IP_address
Where
secret_key specifies a string that is used to generate a short-lived password and
IP_address Specifies the IP address of the HA group primary interface on the
other appliance in the group.
-
On the other appliance, issue the crthagrp command from the
mqcli
command line to recreate the HA group:
crthagrp -s secret_key -r
The HA group is recreated. The HA queue managers continue to run on the existing appliance
while you restore the HA group, and do not fail back to the new appliance unless you have designated
the new appliance as the preferred location (or other fail over conditions are met, see
Causes of HA failover).
- If your original configuration used a custom replication link, use the
sethalink command on both appliances to configure the custom link, see Configuring custom HA replication interfaces.
-
To validate the HA group:
-
Check the output of the crthagrp command and ensure that all the HA queue
managers were successfully recreated on the new appliance. (If any of the HA queue managers also had
DR configured you should also see messages about DR.)
-
Check the status of each of the queue managers on the surviving appliance and on the
replacement appliance. You should see the HA status as
Normal
and the DR status as
Normal
if the queue manager is an HA primary, or no DR status field if it is an HA
secondary (repeat the status check if you see the status synchronization in
progress
).
-
If you have DR configured, run the dspdrlink command on the replacement
appliance and check that there are no errors in the output.
-
If you created a test queue manager as part of your preparation, try failing it over. Also try
creating a new HA queue manager.
Example
In this example, you configure a new appliance named 'CASTOR'. The surviving appliance is named
'POLLUX', and POLLUX is running the HA queue manager 'HA_QM1' and the HA/DR queue manager
'DRHA_QM1'. You prepare CASTOR by running the following command:
prepareha -s SuperSecretPassword -a 192.168.123.200
You then run the following command on POLLUX to recreate the group:
crthagrp -s SuperSecretPassword -r
POLLUX outputs the following messages as the HA group is recreated:
Creating high availability configuration on appliance 'CASTOR'.
Recreating high availability configuration for queue manager 'HA_QM1' on appliance 'CASTOR'.
Recreation completed for queue manager 'HA_QM1' on appliance 'CASTOR'.
Recreating high availability configuration for queue manager 'DRHA_QM1' on appliance 'CASTOR'.
Recreating disaster recovery configuration for queue manager 'DRHA_QM1' on appliance 'CASTOR'.
Recreation completed for queue manager 'DRHA_QM1' on appliance 'CASTOR'.
This Appliance: Online
Appliance CASTOR: Online
What to do next
After validating that the appliance replacement is successful and the new appliance fully
functional, reset the preferred locations of your HA queue managers as required and recommence
normal operation.