[Linux]

Replacing a failed node in a DR/HA configuration

If one of the nodes in either of your HA groups fails, you can replace it.

About this task

The procedure varies according to whether the node that you are replacing is a primary or a secondary in the DR configuration. In either case, the new node must have an identical configuration to the node that you are replacing, that is, it must have the same hostname, same IP addresses, and so on.

You might also encounter the situation where you have completely lost the HA group at your main or recovery site and have to replace the entire HA group.

Procedure

  • For a replacement node that is a primary in the DR configuration, complete the following steps on the new node:
    1. Create an rdqm.ini file that matches the files on the other nodes, and then run the rdqmadm -c command (see Defining the Pacemaker cluster (HA group)).
    2. Run the crtmqm -sxs -rr p qmanager command to recreate each DR/HA RDQM (see Creating DR/HA RDQMs).
  • For a replacement node that is a secondary in the DR configuration, complete the following steps on the new node:
    1. Create an rdqm.ini file that matches the files on the other nodes, and then run the rdqmadm -c command (see Defining the Pacemaker cluster (HA group)).
    2. Run the crtmqm -sx -rr s qmanager command to recreate each DR/HA RDQM (see Creating DR/HA RDQMs).
  • To replace an entire HA group, complete the following steps:
    1. If you lose the entire HA group at the DR primary site (that is, the main site), then you must follow the steps to perform a managed failover to the DR secondary site to keep running your DR/HA RDQMs (see Switching over to a recovery node if a disaster occurs in a DR/HA configuration).
    2. Recreate the HA group on your three replacement nodes, as described in Configuring HA groups for DR/HA RDQMs.
    3. Recreate your DR/HA RDQMs on the new HA group as described in Creating DR/HA RDQMs.
    4. If required, perform a managed failover from your recovery site back to your main site.