HA notification examples

This topic illustrates common sequences of notifications that might be issued by a High Availability (HA) configuration.

Creating an HA queue manager

When you create a new HA queue manager, a series of notifications are issued. First you issue the command to create the queue manager:
crtmqm -sx QM_HA
As the creation progresses the following sequence of state notifications are seen.
Replication notifications
AMQ3592E: HA status for queue manager QM_HA is 'Inconsistent'

As the data replication resources are created for the queue manager on the remote appliance, this state indicates it has yet to receive data from the local appliance.

AMQ3598W: HA status for queue manager QM_HA is 'Synchronization in progress'

After the data replication resources on both appliances connect they begin synchronizing the data from the local appliance to the remote appliance. If replication is interrupted, for example, by interruption of the network, the state returns to Inconsistent.

AMQ3599I: HA status for queue manager QM_HA is 'Normal'

After the secondary appliance has synchronized all the data from the primary appliance, all disk writes on the primary appliance are synchronously replicated to the secondary appliance. This is the Normal state.

High Availability notifications

Alongside the notifications relating to the health of the data replication are additional notifications relating to the high availability state.

AMQ3912I: HA preferred appliance for queue manager QM_HA is 'Appliance1'

This message is issued on both appliances to indicate which appliance has been selected as the preferred location for the queue manager.

The appliance on which crtmqm is issued is automatically selected as the preferred location, it can be changed by issuing the sethapreferred command on the remote appliance.

AMQ3902I: HA role for queue manager QM_HA is primary

This message is seen on the appliance where crtmqm is issued, as it promotes the local instance of the queue manager to the primary role.

AMQ3903I: HA role for queue manager QM_HA is secondary

The remote appliance reports it is taking on the secondary role for the queue manager being created.

AMQ3913I: HA control for queue manager QM_HA is enabled

The final notification triggered by the crtmqm command indicates the queue manager has been put under control of the HA subsystem, which then monitors the two appliances to determine where best to run the queue manager.

Shortly after this message is issued the queue manager automatically starts on the primary appliance.

Suspending an HA appliance

If an HA appliance is suspended using the sethagrp -s the following sequence of notifications is seen:

AMQ3911W: HA group suspended on the local appliance 
AMQ3910W: HA group suspended on remote appliance 'Appliance1'

Both appliances report which appliance is being suspended.

AMQ3569I: HA role for queue manager QM_HA is none

Any queue managers running in the primary role on the suspended appliance are stopped and now have no HA role. Any queue managers in the secondary role are also reported as having no role.

AMQ3595W: HA status for queue manager QM_HA is 'Inactive'

For a brief period, any queue manager that is switching over reports Inactive, showing that both appliances are in the secondary role.

AMQ3902I: HA role for queue manager QM_HA is primary

The relocated queue manager is then promoted to Primary and started.

AMQ3576E: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is unavailable

This notification reports that HA replication is no longer available as a consequence of the appliance suspending and stopping the data replication resources for all HA queue managers.

AMQ3591W: HA status for queue manager QM_HA is 'This appliance in standby mode'
AMQ3590W: HA status for queue manager QM_HA is 'Secondary appliance in standby mode'

As the appliance that is suspending completes the job of stopping all the HA and data replication resources, both appliances report that the HA queue manager is in standby mode.

Resuming an HA appliance

When a suspended appliance is resumed using the sethagrp -r command, the following sequence of notifications is seen:

AMQ3909I: HA group resumed on the local appliance
AMQ3908I: HA group resumed on remote appliance 'Appliance1'

Both appliances report that the suspended node is being resumed.

AMQ3577I: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is available

As the appliance restarts, the HA and data replication resources for each queue manager can again begin replication.

AMQ3592E: HA status for queue manager QM_HA is 'Inconsistent'
AMQ3598W: HA status for queue manager QM_HA is 'Synchronization in progress'
AMQ3599I: HA status for queue manager QM_HA is 'Normal'

As the data replication reconnects, the resuming appliance recognizes that the queue manager data is not up to date with the remote appliance and begins synchronization.

AMQ3903I: HA role for queue manager QM_HA is secondary
AMQ3595W: HA status for queue manager QM_HA is 'Inactive'
AMQ3902I: HA role for queue manager QM_HA is primary

Any queue managers that have their preferred location on the resumed appliance are stopped and demoted to secondary to allow them to be promoted to Primary and started on their preferred location.

Losing communication with remote appliance

There are several different notifications that can be seen when communication is lost between HA appliances, depending on which network interfaces are affected.

Replication interface
AMQ3576E: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is unavailable
AMQ3594E: HA status for queue manager QM_HA is 'Remote appliance unavailable'

Losing communication for replication is reported and results in a state change to Remote appliance unavailable. From this point the data on the secondary appliance becomes increasingly out of date.

Heartbeat interfaces
AMQ3904E: HA heartbeat connection to secondary appliance 'Appliance1' using interface 'eth13' is unavailable

The appliance notifies when communication is unavailable for a single heartbeat link, that is, Primary interface (eth13) or Alternate interface (eth17). The appliance HA configuration can tolerate loss of a single heartbeat link.

AMQ3906S: HA secondary appliance 'Appliance1' is unavailable
AMQ3597E: HA status for queue manager QM_HA is 'Secondary appliance unavailable'

If both links are unavailable, additional notification is made of the remote appliance being unavailable, which is also reported for each affected HA queue manager.

AMQ3902I: HA role for queue manager QM_HA is primary

If the remote appliance is unavailable, all queue managers currently in the secondary role are promoted to primary and started.

Regaining communication with remote appliance

As with losing communication interfaces with the remote appliance, there are different notifications as each communications interface between HA appliances becomes available.

Regaining a heartbeat interface
AMQ3905I: HA heartbeat connection to secondary appliance 'Appliance1' using interface 'eth13' is available

The appliance notifies when each heartbeat connection, that is, Primary interface or Alternate interface, becoming available.

AMQ3907I: HA secondary appliance 'Appliance1' is available

If regaining a heartbeat connection allows HA coordination between the appliances to resume, the appliance notifies that the secondary appliance is available.

Regaining Replication
AMQ3577I: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is available

A notification is produced for each queue manager when the data replication connection is re-established.

Typically, this will be shortly followed by the secondary appliance reporting the queue manager data as inconsistent then synchronizing from the primary.

Partitioned behavior

If all HA heartbeat and replication links are lost but the remote appliance is still operating, the same sequence of states occurs on both appliances, resulting in the queue manager running simultaneously on both. This situation results in the queue manager data deviating in a manner that cannot be reconciled.
AMQ3596S: HA status for queue manager QM_HA is 'Partitioned'

When the replication links recover, the data replication resources detect the deviation in queue manager data and report that the queue manager has become Partitioned. If heartbeat links are available, this will be reported on both appliances. Duplicate notifications are possible if both appliances simultaneously detect the deviation.

On detecting the partitioned state, the replication connection becomes unavailable.

Additionally, one of the running instances is stopped and demoted to the secondary role. Typically, the queue manager remains running on the preferred node.

Resolving a partitioned state

After following the steps in Resolving a partitioned problem in a high availability configuration to identify which appliance holds the 'winning' data for partition resolution, use the makehaprimary command on the identified appliance.
AMQ3903I: HA role for queue manager QM_HA is secondary
AMQ3902I: HA role for queue manager QM_HA is primary

If the makehaprimary command is run on the current secondary appliance, the queue manager instance is demoted on the primary appliance to allow it to be promoted, and started, where the command has been issued.

AMQ3577I: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is available
AMQ3592E: HA status for queue manager QM_HA is 'Inconsistent'
AMQ3598W: HA status for queue manager QM_HA is 'Synchronization in progress'
AMQ3599I: HA status for queue manager QM_HA is 'Normal'

Data replication is then reestablished between the appliances, with the data to be replaced marked as Inconsistent, allowing synchronization from the chosen primary node.

AMQ3595W: HA status for queue manager QM_HA is 'Inactive'
AMQ3903I: HA role for queue manager QM_HA is secondary
AMQ3902I: HA role for queue manager QM_HA is primary

After synchronization has completed, if makehaprimary was not issued on the preferred node, the queue manager is moved back to the preferred node.