![[MQ 9.4.2 Feb 2025]](ng942.gif)
Native HA Cross-Region Replication
You can implement a Native HA Cross-Region Replication (CRR) configuration based on the Native HA solution. The CRR configuration provides disaster recovery.
See Native HA for an overview of the Native HA solution.
Native HA uses a group of three queue manager instances to maintain a replicated copy of log data. The group automatically elects an active instance by comparing and choosing the best copy of data. CRR extends log replication to include a potentially geographically distant group (region) that is also highly available. When there is a planned or unplanned outage at the main site, the group at the recovery site can take over the work. You initially define which role a group plays, and then switch the role of a group manually as required to complete a planned switchover, or when an unplanned outage occurs. The log data is replicated to the recovery group asynchronously, so there might be some data loss when a switchover following an unplanned outage occurs.
Requested roles and Effective roles
A Requested role is the GroupRole configuration setting for a Native HA instance that is
specified in the NativeHALocalInstance
stanza of its qm.ini.
The setting requests that an instance co-operate with the other instances in its group to fulfill a
role. Once a majority of instances share the same Requested role, an election can take place to
choose a leader of the group. On winning an election, the Requested role is compared with the Last
role the instance used to derive an Effective role. See NativeHAInstance stanza of the qm.ini
file and NativeHALocalInstance stanza of the qm.ini file.
- Live role
-
A Live group is a regular Native HA queue manager consisting of three instances. One instance is elected as the active one and this instance accepts new work from applications. An existing Native HA queue manager can be migrated to a Native CRR configuration.
If a Requested role is not explicitly specified in the qm.ini, the default is considered to be Live. An elected leader of a Live group is also known as the active queue manager, this instance can accept application and tooling connections to perform new messaging work and manage queue manager objects.
- Recovery role
-
A Recovery group also consists of three instances and one of these instances is elected as leader. The leader acts only as a proxy, accepting work sent from a paired Live group. No application connections are permitted to the Recovery group, although some control commands are allowed so that operations such as ending the queue manager instance can be completed.
The
NativeHARecoveryGroup
stanza in the qm.ini configuration file of the Live group identifies the network address for the Recovery group. - Pending role transitions
-
When performing a planned switchover where the current Live group becomes the new Recovery group (and the current Recovery group becomes the new Live group) there needs to be co-ordination between the groups. This co-ordination ensures that the logs are identical and achieves an RPO (recovery point objective) of zero.
Where two groups are coordinating the switch between Recovery and Live, an effective Pending role is temporarily adopted by both groups.
- Recovery role to Live role (Pending Live)
-
A group that has previously operated in a Recovery role that intends on changing to a Live role makes the decision on when to use the role based on whether there is another group configured.
If there is no other configured group, it transitions to the Live role immediately.
If another group is configured and enabled, the transition to a Live role is blocked until the other group adopts a Recovery or Pending Recovery role, and until both groups are in-sync. - Live role to Recovery role (Pending Recovery)
-
A group that had previously operated in a Live role that intends on changing to a Recovery role will make the decision on when to use the Recovery role, based on whether there is another group configured.
If another group is configured and enabled, the transition to a Recovery role is blocked until the other group adopts a Live or Pending Live role and until both groups are in-sync.
Replication and rebasing
While the two groups in a Native HA CRR configuration are connected, log updates are passed to the leader of the recovery group and passed on to the other recovery instances. This is known as replication.
If the network connection between the two groups is lost, it might happen that the recovery group could not catch up by using replicated logs (for example, because the live group has reused log extents that would be needed for the catch up). In this case, the recovery leader performs a rebase. A rebase is the process of discarding the received log data and rebuilding the queue manager from a complete set of log data sent from the live group. Once the recovery leader has completed the rebase, the logs are replicated to the recovery instances.
If a problem is encountered during a rebase operation, then it is possible that the queue manager could be unable to start on any of the members of the recovery group. To avoid this possibility, a backup of the log is taken before the rebase operation commences. The queue manager can then be restarted using the backup log data if the rebase fails. The backup log is deleted after the rebase has succeeded.
Because of the requirement for a backup log, a Native HA CRR configuration requires at least twice the storage space of a Native HA configuration.