HA notification examples
This topic illustrates common sequences of notifications that might be issued by a High Availability (HA) configuration.
Creating an HA queue manager
crtmqm -sx QM_HA- Replication notifications
-
AMQ3592E: HA status for queue manager QM_HA is 'Inconsistent'As the data replication resources are created for the queue manager on the remote appliance, this state indicates it has yet to receive data from the local appliance.
- High Availability notifications
-
Alongside the notifications relating to the health of the data replication are additional notifications relating to the high availability state.
Suspending an HA appliance
If an HA appliance is suspended using the sethagrp -s the following sequence of notifications is seen:
AMQ3911W: HA group suspended on the local appliance
AMQ3910W: HA group suspended on remote appliance 'Appliance1'
Both appliances report which appliance is being suspended.
AMQ3569I: HA role for queue manager QM_HA is none
Any queue managers running in the primary role on the suspended appliance are stopped and now have no HA role. Any queue managers in the secondary role are also reported as having no role.
AMQ3595W: HA status for queue manager QM_HA is 'Inactive'
For a brief period, any queue manager that is switching over reports Inactive, showing that both appliances are in the secondary role.
AMQ3902I: HA role for queue manager QM_HA is primary
The relocated queue manager is then promoted to Primary and started.
AMQ3576E: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is unavailable
This notification reports that HA replication is no longer available as a consequence of the appliance suspending and stopping the data replication resources for all HA queue managers.
AMQ3591W: HA status for queue manager QM_HA is 'This appliance in standby mode'
AMQ3590W: HA status for queue manager QM_HA is 'Secondary appliance in standby mode'
As the appliance that is suspending completes the job of stopping all the HA and data replication resources, both appliances report that the HA queue manager is in standby mode.
Resuming an HA appliance
When a suspended appliance is resumed using the sethagrp -r command, the following sequence of notifications is seen:
AMQ3909I: HA group resumed on the local appliance
AMQ3908I: HA group resumed on remote appliance 'Appliance1'
Both appliances report that the suspended node is being resumed.
AMQ3577I: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is available
As the appliance restarts, the HA and data replication resources for each queue manager can again begin replication.
AMQ3592E: HA status for queue manager QM_HA is 'Inconsistent'
AMQ3598W: HA status for queue manager QM_HA is 'Synchronization in progress'
AMQ3599I: HA status for queue manager QM_HA is 'Normal'
As the data replication reconnects, the resuming appliance recognizes that the queue manager data is not up to date with the remote appliance and begins synchronization.
AMQ3903I: HA role for queue manager QM_HA is secondary
AMQ3595W: HA status for queue manager QM_HA is 'Inactive'
AMQ3902I: HA role for queue manager QM_HA is primary
Any queue managers that have their preferred location on the resumed appliance are stopped and demoted to secondary to allow them to be promoted to Primary and started on their preferred location.
Losing communication with remote appliance
There are several different notifications that can be seen when communication is lost between HA appliances, depending on which network interfaces are affected.
- Replication interface
-
AMQ3576E: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is unavailable AMQ3594E: HA status for queue manager QM_HA is 'Remote appliance unavailable'Losing communication for replication is reported and results in a state change to
Remote appliance unavailable. From this point the data on the secondary appliance becomes increasingly out of date. - Heartbeat interfaces
-
AMQ3904E: HA heartbeat connection to secondary appliance 'Appliance1' using interface 'eth13' is unavailableThe appliance notifies when communication is unavailable for a single heartbeat link, that is, Primary interface (eth13) or Alternate interface (eth17). The appliance HA configuration can tolerate loss of a single heartbeat link.
AMQ3906S: HA secondary appliance 'Appliance1' is unavailable AMQ3597E: HA status for queue manager QM_HA is 'Secondary appliance unavailable'If both links are unavailable, additional notification is made of the remote appliance being unavailable, which is also reported for each affected HA queue manager.
AMQ3902I: HA role for queue manager QM_HA is primaryIf the remote appliance is unavailable, all queue managers currently in the secondary role are promoted to primary and started.
Regaining communication with remote appliance
As with losing communication interfaces with the remote appliance, there are different notifications as each communications interface between HA appliances becomes available.
- Regaining a heartbeat interface
-
AMQ3905I: HA heartbeat connection to secondary appliance 'Appliance1' using interface 'eth13' is availableThe appliance notifies when each heartbeat connection, that is, Primary interface or Alternate interface, becoming available.
- Regaining Replication
-
AMQ3577I: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is availableA notification is produced for each queue manager when the data replication connection is re-established.
Typically, this will be shortly followed by the secondary appliance reporting the queue manager data as inconsistent then synchronizing from the primary.
Partitioned behavior
AMQ3596S: HA status for queue manager QM_HA is 'Partitioned'When the replication links recover, the data replication resources detect the deviation in queue
manager data and report that the queue manager has become Partitioned. If heartbeat
links are available, this will be reported on both appliances. Duplicate notifications are possible
if both appliances simultaneously detect the deviation.
On detecting the partitioned state, the replication connection becomes unavailable.
Additionally, one of the running instances is stopped and demoted to the secondary role. Typically, the queue manager remains running on the preferred node.
Resolving a partitioned state
AMQ3903I: HA role for queue manager QM_HA is secondary
AMQ3902I: HA role for queue manager QM_HA is primary
If the makehaprimary command is run on the current secondary appliance, the queue manager instance is demoted on the primary appliance to allow it to be promoted, and started, where the command has been issued.
AMQ3577I: HA replication to remote appliance 'Appliance1' for queue manager 'QM_HA' using interface 'eth21' is available
AMQ3592E: HA status for queue manager QM_HA is 'Inconsistent'
AMQ3598W: HA status for queue manager QM_HA is 'Synchronization in progress'
AMQ3599I: HA status for queue manager QM_HA is 'Normal'
Data replication is then reestablished between the appliances, with the data to be replaced
marked as Inconsistent, allowing synchronization from the chosen primary node.
AMQ3595W: HA status for queue manager QM_HA is 'Inactive'
AMQ3903I: HA role for queue manager QM_HA is secondary
AMQ3902I: HA role for queue manager QM_HA is primary
After synchronization has completed, if makehaprimary was not issued on the preferred node, the queue manager is moved back to the preferred node.